RAG-RewardBench:用於檢驗檢索增強生成中獎勵模型的偏好對齊基準。
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
December 18, 2024
作者: Zhuoran Jin, Hongbang Yuan, Tianyi Men, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao
cs.AI
摘要
儘管現有的檢索增強語言模型(RALMs)在提供可信賴的回應和可靠來源的基礎方面取得了顯著進展,但它們常常忽略與人類偏好的有效對齊。在對齊過程中,獎勵模型(RMs)作為引導優化的人類價值觀的重要代理。然而,如何評估和選擇可靠的RMs以進行RALMs中的偏好對齊仍不清楚。為此,我們提出了RAG-RewardBench,這是用於評估RAG設置中RMs的第一個基準。首先,我們設計了四個關鍵且具有挑戰性的RAG特定情境來評估RMs,包括多躍推理、細粒度引文、適當棄權和衝突韌性。然後,我們結合了18個RAG子集、六個檢索器和24個RALMs以增加數據來源的多樣性。最後,我們採用了LLM作為評判的方法來提高偏好標註的效率和有效性,展現出與人類標註之間的強烈相關性。基於RAG-RewardBench,我們對45個RMs進行了全面評估,揭示了它們在RAG情境中的局限性。此外,我們還發現現有訓練過的RALMs在偏好對齊方面幾乎沒有改善,突顯了需要轉向偏好對齊訓練的必要性。我們在https://huggingface.co/datasets/jinzhuoran/RAG-RewardBench/上公開發布了我們的基準和代碼,供未來工作使用。
English
Despite the significant progress made by existing retrieval augmented
language models (RALMs) in providing trustworthy responses and grounding in
reliable sources, they often overlook effective alignment with human
preferences. In the alignment process, reward models (RMs) act as a crucial
proxy for human values to guide optimization. However, it remains unclear how
to evaluate and select a reliable RM for preference alignment in RALMs. To this
end, we propose RAG-RewardBench, the first benchmark for evaluating RMs in RAG
settings. First, we design four crucial and challenging RAG-specific scenarios
to assess RMs, including multi-hop reasoning, fine-grained citation,
appropriate abstain, and conflict robustness. Then, we incorporate 18 RAG
subsets, six retrievers, and 24 RALMs to increase the diversity of data
sources. Finally, we adopt an LLM-as-a-judge approach to improve preference
annotation efficiency and effectiveness, exhibiting a strong correlation with
human annotations. Based on the RAG-RewardBench, we conduct a comprehensive
evaluation of 45 RMs and uncover their limitations in RAG scenarios.
Additionally, we also reveal that existing trained RALMs show almost no
improvement in preference alignment, highlighting the need for a shift towards
preference-aligned training.We release our benchmark and code publicly at
https://huggingface.co/datasets/jinzhuoran/RAG-RewardBench/ for future work.Summary
AI-Generated Summary