CoRAG：协作式检索增强生成

摘要

检索增强生成（RAG）模型在知识密集型任务中表现出色，尤其在少样本学习限制下。我们提出了CoRAG框架，将RAG扩展至协作场景，其中客户端通过共享的段落库共同训练一个模型。为评估CoRAG，我们引入了CRAB基准，用于协作式同质开放域问答。实验表明，在资源匮乏的情境下，CoRAG持续超越参数化协作学习方法及本地训练的RAG模型。深入分析揭示了共享库中相关段落的关键作用、引入无关段落带来的意外益处，以及硬负样本可能对性能产生的负面影响。这为协作式RAG引入了一个新的考量：在利用集体丰富知识库的优势与可能引入其他客户端有害段落的风险之间寻求平衡。我们的发现不仅证实了CoRAG的可行性，同时也指出了关键的设计挑战及未来研究的有望方向。

English

Retrieval-Augmented Generation (RAG) models excel in knowledge-intensive tasks, especially under few-shot learning constraints. We introduce CoRAG, a framework extending RAG to collaborative settings, where clients jointly train a shared model using a collaborative passage store. To evaluate CoRAG, we introduce CRAB, a benchmark for collaborative homogeneous open-domain question answering. Our experiments demonstrate that CoRAG consistently outperforms both parametric collaborative learning methods and locally trained RAG models in low-resource scenarios. Further analysis reveals the critical importance of relevant passages within the shared store, the surprising benefits of incorporating irrelevant passages, and the potential for hard negatives to negatively impact performance. This introduces a novel consideration in collaborative RAG: the trade-off between leveraging a collectively enriched knowledge base and the potential risk of incorporating detrimental passages from other clients. Our findings underscore the viability of CoRAG, while also highlighting key design challenges and promising avenues for future research.

CoRAG：协作式检索增强生成

CoRAG: Collaborative Retrieval-Augmented Generation

摘要

Summary

Support

Support