ChatPaper.aiChatPaper

自学习长上下文理解智能体

Self-Taught Agentic Long Context Understanding

February 21, 2025
作者: Yufan Zhuang, Xiaodong Yu, Jialian Wu, Ximeng Sun, Ze Wang, Jiang Liu, Yusheng Su, Jingbo Shang, Zicheng Liu, Emad Barsoum
cs.AI

摘要

处理复杂、长上下文的问题仍然是大型语言模型(LLMs)面临的主要挑战,因为这需要有效的问题澄清和上下文检索。我们提出了“主动性长上下文理解框架”(AgenticLU),该框架旨在通过将目标自我澄清与上下文基础整合到主动性工作流程中,增强LLM对此类查询的理解。AgenticLU的核心是“澄清链”(CoC),模型通过自我生成的澄清问题及相应的上下文基础来精炼其理解。通过将推理扩展为树搜索,其中每个节点代表一个CoC步骤,我们在NarrativeQA上实现了97.8%的答案召回率,搜索深度可达三层,分支因子为八。为了将这一高成本搜索过程分摊到训练中,我们利用CoC工作流程获得的每一步偏好对,并执行两阶段模型微调:(1)监督微调以学习有效的分解策略,(2)直接偏好优化以提升推理质量。这使得AgenticLU模型能够在单次推理过程中高效地生成澄清并检索相关上下文。在七个长上下文任务上的广泛实验表明,AgenticLU显著优于最先进的提示方法和专门的长上下文LLMs,实现了稳健的多跳推理,并在上下文长度增长时保持一致的性能。
English
Answering complex, long-context questions remains a major challenge for large language models (LLMs) as it requires effective question clarifications and context retrieval. We propose Agentic Long-Context Understanding (AgenticLU), a framework designed to enhance an LLM's understanding of such queries by integrating targeted self-clarification with contextual grounding within an agentic workflow. At the core of AgenticLU is Chain-of-Clarifications (CoC), where models refine their understanding through self-generated clarification questions and corresponding contextual groundings. By scaling inference as a tree search where each node represents a CoC step, we achieve 97.8% answer recall on NarrativeQA with a search depth of up to three and a branching factor of eight. To amortize the high cost of this search process to training, we leverage the preference pairs for each step obtained by the CoC workflow and perform two-stage model finetuning: (1) supervised finetuning to learn effective decomposition strategies, and (2) direct preference optimization to enhance reasoning quality. This enables AgenticLU models to generate clarifications and retrieve relevant context effectively and efficiently in a single inference pass. Extensive experiments across seven long-context tasks demonstrate that AgenticLU significantly outperforms state-of-the-art prompting methods and specialized long-context LLMs, achieving robust multi-hop reasoning while sustaining consistent performance as context length grows.

Summary

AI-Generated Summary

PDF22February 25, 2025