CoRAG：協作式檢索增強生成

摘要

檢索增強生成（RAG）模型在知識密集型任務中表現卓越，尤其是在少樣本學習的限制下。我們介紹了CoRAG，這是一個將RAG擴展到協作環境的框架，其中客戶端通過協作段落存儲共同訓練一個共享模型。為了評估CoRAG，我們引入了CRAB，這是一個用於協作同質開放域問答的基準。我們的實驗表明，在資源匱乏的情況下，CoRAG始終優於參數化協作學習方法和本地訓練的RAG模型。進一步的分析揭示了共享存儲中相關段落的關鍵重要性、引入不相關段落的意外益處，以及硬負樣本可能對性能產生的負面影響。這在協作RAG中引入了一個新的考量：利用集體豐富的知識庫與可能引入來自其他客戶端的有害段落之間的權衡。我們的研究結果強調了CoRAG的可行性，同時也指出了關鍵的設計挑戰和未來研究的有望方向。

English

Retrieval-Augmented Generation (RAG) models excel in knowledge-intensive tasks, especially under few-shot learning constraints. We introduce CoRAG, a framework extending RAG to collaborative settings, where clients jointly train a shared model using a collaborative passage store. To evaluate CoRAG, we introduce CRAB, a benchmark for collaborative homogeneous open-domain question answering. Our experiments demonstrate that CoRAG consistently outperforms both parametric collaborative learning methods and locally trained RAG models in low-resource scenarios. Further analysis reveals the critical importance of relevant passages within the shared store, the surprising benefits of incorporating irrelevant passages, and the potential for hard negatives to negatively impact performance. This introduces a novel consideration in collaborative RAG: the trade-off between leveraging a collectively enriched knowledge base and the potential risk of incorporating detrimental passages from other clients. Our findings underscore the viability of CoRAG, while also highlighting key design challenges and promising avenues for future research.

CoRAG：協作式檢索增強生成

CoRAG: Collaborative Retrieval-Augmented Generation

摘要

Summary

Support

Support