自学习长上下文理解智能体
Self-Taught Agentic Long Context Understanding
February 21, 2025
作者: Yufan Zhuang, Xiaodong Yu, Jialian Wu, Ximeng Sun, Ze Wang, Jiang Liu, Yusheng Su, Jingbo Shang, Zicheng Liu, Emad Barsoum
cs.AI
摘要
处理复杂、长上下文的问题仍然是大型语言模型(LLMs)面临的主要挑战,因为这需要有效的问题澄清和上下文检索。我们提出了“主动性长上下文理解框架”(AgenticLU),该框架旨在通过将目标自我澄清与上下文基础整合到主动性工作流程中,增强LLM对此类查询的理解。AgenticLU的核心是“澄清链”(CoC),模型通过自我生成的澄清问题及相应的上下文基础来精炼其理解。通过将推理扩展为树搜索,其中每个节点代表一个CoC步骤,我们在NarrativeQA上实现了97.8%的答案召回率,搜索深度可达三层,分支因子为八。为了将这一高成本搜索过程分摊到训练中,我们利用CoC工作流程获得的每一步偏好对,并执行两阶段模型微调:(1)监督微调以学习有效的分解策略,(2)直接偏好优化以提升推理质量。这使得AgenticLU模型能够在单次推理过程中高效地生成澄清并检索相关上下文。在七个长上下文任务上的广泛实验表明,AgenticLU显著优于最先进的提示方法和专门的长上下文LLMs,实现了稳健的多跳推理,并在上下文长度增长时保持一致的性能。
English
Answering complex, long-context questions remains a major challenge for large
language models (LLMs) as it requires effective question clarifications and
context retrieval. We propose Agentic Long-Context Understanding (AgenticLU), a
framework designed to enhance an LLM's understanding of such queries by
integrating targeted self-clarification with contextual grounding within an
agentic workflow. At the core of AgenticLU is Chain-of-Clarifications (CoC),
where models refine their understanding through self-generated clarification
questions and corresponding contextual groundings. By scaling inference as a
tree search where each node represents a CoC step, we achieve 97.8% answer
recall on NarrativeQA with a search depth of up to three and a branching factor
of eight. To amortize the high cost of this search process to training, we
leverage the preference pairs for each step obtained by the CoC workflow and
perform two-stage model finetuning: (1) supervised finetuning to learn
effective decomposition strategies, and (2) direct preference optimization to
enhance reasoning quality. This enables AgenticLU models to generate
clarifications and retrieve relevant context effectively and efficiently in a
single inference pass. Extensive experiments across seven long-context tasks
demonstrate that AgenticLU significantly outperforms state-of-the-art prompting
methods and specialized long-context LLMs, achieving robust multi-hop reasoning
while sustaining consistent performance as context length grows.Summary
AI-Generated Summary