检索链增强生成
Chain-of-Retrieval Augmented Generation
January 24, 2025
作者: Liang Wang, Haonan Chen, Nan Yang, Xiaolong Huang, Zhicheng Dou, Furu Wei
cs.AI
摘要
本文介绍了一种训练类似于o1的RAG模型的方法,该模型在生成最终答案之前逐步检索和推理相关信息。传统的RAG方法通常在生成过程之前执行单一的检索步骤,这限制了它们在处理复杂查询时的有效性,因为检索结果可能不完整。相比之下,我们提出的方法,CoRAG(检索链增强生成),允许模型根据不断演变的状态动态重新构造查询。为了有效训练CoRAG,我们利用拒绝抽样自动生成中间检索链,从而增强现有的仅提供正确最终答案的RAG数据集。在测试阶段,我们提出了各种解码策略,通过控制采样检索链的长度和数量来扩展模型的测试计算。跨多个基准测试的实验结果验证了CoRAG的有效性,特别是在多跳问题回答任务中,我们观察到与强基线相比EM分数提高了超过10个百分点。在KILT基准测试中,CoRAG在各种知识密集型任务中建立了新的最先进性能。此外,我们提供了全面的分析来了解CoRAG的扩展行为,为未来旨在开发基于事实和基础的模型的研究奠定基础。
English
This paper introduces an approach for training o1-like RAG models that
retrieve and reason over relevant information step by step before generating
the final answer. Conventional RAG methods usually perform a single retrieval
step before the generation process, which limits their effectiveness in
addressing complex queries due to imperfect retrieval results. In contrast, our
proposed method, CoRAG (Chain-of-Retrieval Augmented Generation), allows the
model to dynamically reformulate the query based on the evolving state. To
train CoRAG effectively, we utilize rejection sampling to automatically
generate intermediate retrieval chains, thereby augmenting existing RAG
datasets that only provide the correct final answer. At test time, we propose
various decoding strategies to scale the model's test-time compute by
controlling the length and number of sampled retrieval chains. Experimental
results across multiple benchmarks validate the efficacy of CoRAG, particularly
in multi-hop question answering tasks, where we observe more than 10 points
improvement in EM score compared to strong baselines. On the KILT benchmark,
CoRAG establishes a new state-of-the-art performance across a diverse range of
knowledge-intensive tasks. Furthermore, we offer comprehensive analyses to
understand the scaling behavior of CoRAG, laying the groundwork for future
research aimed at developing factual and grounded foundation models.Summary
AI-Generated Summary