RAG-RewardBench：選好整合のための検索拡張生成における報酬モデルのベンチマーク化

要旨

既存の検索強化言語モデル（RALM）が信頼できる応答と確かなソースに基づいている点で大きな進展を遂げているにもかかわらず、人間の好みとの効果的な整合性をしばしば見落としています。整合プロセスでは、報酬モデル（RM）が最適化を導くための人間の価値観の重要なプロキシとして機能します。ただし、RALMにおける好みの整合性のための信頼性のあるRMを評価および選択する方法は依然として不明です。このため、私たちは、RAG設定におけるRMの評価のための初のベンチマークであるRAG-RewardBenchを提案します。まず、マルチホップ推論、細かい引用、適切な棄却、および衝突耐性を含む4つの重要で難しいRAG固有のシナリオを設計して、RMを評価します。次に、データソースの多様性を高めるために、18のRAGサブセット、6つのリトリーバー、および24のRALMを組み込みます。最後に、好みの注釈の効率と効果を向上させるために、LLMを判定者として採用し、人間の注釈と強い相関を示します。RAG-RewardBenchに基づいて、45のRMを包括的に評価し、その限界をRAGシナリオで明らかにします。さらに、既存の訓練済みRALMは好みの整合性でほとんど改善が見られないことも明らかにし、好みに整合したトレーニングにシフトする必要性を強調しています。今後の作業のために、当社のベンチマークとコードを https://huggingface.co/datasets/jinzhuoran/RAG-RewardBench/ で公開しています。

English

Despite the significant progress made by existing retrieval augmented language models (RALMs) in providing trustworthy responses and grounding in reliable sources, they often overlook effective alignment with human preferences. In the alignment process, reward models (RMs) act as a crucial proxy for human values to guide optimization. However, it remains unclear how to evaluate and select a reliable RM for preference alignment in RALMs. To this end, we propose RAG-RewardBench, the first benchmark for evaluating RMs in RAG settings. First, we design four crucial and challenging RAG-specific scenarios to assess RMs, including multi-hop reasoning, fine-grained citation, appropriate abstain, and conflict robustness. Then, we incorporate 18 RAG subsets, six retrievers, and 24 RALMs to increase the diversity of data sources. Finally, we adopt an LLM-as-a-judge approach to improve preference annotation efficiency and effectiveness, exhibiting a strong correlation with human annotations. Based on the RAG-RewardBench, we conduct a comprehensive evaluation of 45 RMs and uncover their limitations in RAG scenarios. Additionally, we also reveal that existing trained RALMs show almost no improvement in preference alignment, highlighting the need for a shift towards preference-aligned training.We release our benchmark and code publicly at https://huggingface.co/datasets/jinzhuoran/RAG-RewardBench/ for future work.

RAG-RewardBench：選好整合のための検索拡張生成における報酬モデルのベンチマーク化

RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment

要旨

Summary

Support

Support