CoRAG:协作式检索增强生成
CoRAG: Collaborative Retrieval-Augmented Generation
April 2, 2025
作者: Aashiq Muhamed, Mona Diab, Virginia Smith
cs.AI
摘要
检索增强生成(RAG)模型在知识密集型任务中表现出色,尤其在少样本学习限制下。我们提出了CoRAG框架,将RAG扩展至协作场景,其中客户端通过共享的段落库共同训练一个模型。为评估CoRAG,我们引入了CRAB基准,用于协作式同质开放域问答。实验表明,在资源匮乏的情境下,CoRAG持续超越参数化协作学习方法及本地训练的RAG模型。深入分析揭示了共享库中相关段落的关键作用、引入无关段落带来的意外益处,以及硬负样本可能对性能产生的负面影响。这为协作式RAG引入了一个新的考量:在利用集体丰富知识库的优势与可能引入其他客户端有害段落的风险之间寻求平衡。我们的发现不仅证实了CoRAG的可行性,同时也指出了关键的设计挑战及未来研究的有望方向。
English
Retrieval-Augmented Generation (RAG) models excel in knowledge-intensive
tasks, especially under few-shot learning constraints. We introduce CoRAG, a
framework extending RAG to collaborative settings, where clients jointly train
a shared model using a collaborative passage store. To evaluate CoRAG, we
introduce CRAB, a benchmark for collaborative homogeneous open-domain question
answering. Our experiments demonstrate that CoRAG consistently outperforms both
parametric collaborative learning methods and locally trained RAG models in
low-resource scenarios. Further analysis reveals the critical importance of
relevant passages within the shared store, the surprising benefits of
incorporating irrelevant passages, and the potential for hard negatives to
negatively impact performance. This introduces a novel consideration in
collaborative RAG: the trade-off between leveraging a collectively enriched
knowledge base and the potential risk of incorporating detrimental passages
from other clients. Our findings underscore the viability of CoRAG, while also
highlighting key design challenges and promising avenues for future research.Summary
AI-Generated Summary