CoRAG:協作式檢索增強生成
CoRAG: Collaborative Retrieval-Augmented Generation
April 2, 2025
作者: Aashiq Muhamed, Mona Diab, Virginia Smith
cs.AI
摘要
檢索增強生成(RAG)模型在知識密集型任務中表現卓越,尤其是在少樣本學習的限制下。我們介紹了CoRAG,這是一個將RAG擴展到協作環境的框架,其中客戶端通過協作段落存儲共同訓練一個共享模型。為了評估CoRAG,我們引入了CRAB,這是一個用於協作同質開放域問答的基準。我們的實驗表明,在資源匱乏的情況下,CoRAG始終優於參數化協作學習方法和本地訓練的RAG模型。進一步的分析揭示了共享存儲中相關段落的關鍵重要性、引入不相關段落的意外益處,以及硬負樣本可能對性能產生的負面影響。這在協作RAG中引入了一個新的考量:利用集體豐富的知識庫與可能引入來自其他客戶端的有害段落之間的權衡。我們的研究結果強調了CoRAG的可行性,同時也指出了關鍵的設計挑戰和未來研究的有望方向。
English
Retrieval-Augmented Generation (RAG) models excel in knowledge-intensive
tasks, especially under few-shot learning constraints. We introduce CoRAG, a
framework extending RAG to collaborative settings, where clients jointly train
a shared model using a collaborative passage store. To evaluate CoRAG, we
introduce CRAB, a benchmark for collaborative homogeneous open-domain question
answering. Our experiments demonstrate that CoRAG consistently outperforms both
parametric collaborative learning methods and locally trained RAG models in
low-resource scenarios. Further analysis reveals the critical importance of
relevant passages within the shared store, the surprising benefits of
incorporating irrelevant passages, and the potential for hard negatives to
negatively impact performance. This introduces a novel consideration in
collaborative RAG: the trade-off between leveraging a collectively enriched
knowledge base and the potential risk of incorporating detrimental passages
from other clients. Our findings underscore the viability of CoRAG, while also
highlighting key design challenges and promising avenues for future research.Summary
AI-Generated Summary