RAG-RewardBench:用于检验检索增强生成中奖励模型的偏好对齐基准。
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
December 18, 2024
作者: Zhuoran Jin, Hongbang Yuan, Tianyi Men, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao
cs.AI
摘要
尽管现有的检索增强语言模型(RALMs)在提供可信赖的回答和基于可靠来源的支持方面取得了显著进展,但它们常常忽视与人类偏好的有效对齐。在对齐过程中,奖励模型(RMs)作为人类价值观的重要代理,用于指导优化。然而,如何评估和选择可靠的RMs以实现RALMs中的偏好对齐仍不清楚。为此,我们提出了RAG-RewardBench,这是评估RAG环境中RMs的第一个基准。首先,我们设计了四个关键且具有挑战性的RAG特定场景来评估RMs,包括多跳推理、细粒度引用、适当弃权和冲突鲁棒性。然后,我们结合了18个RAG子集、六个检索器和24个RALMs以增加数据来源的多样性。最后,我们采用LLM作为评判者的方法来提高偏好注释的效率和有效性,并展示了与人类注释的强相关性。基于RAG-RewardBench,我们对45个RMs进行了全面评估,并揭示了它们在RAG场景中的局限性。此外,我们还揭示了现有训练的RALMs几乎没有在偏好对齐方面显示出改进,突显了需要转向偏好对齐训练的必要性。我们在https://huggingface.co/datasets/jinzhuoran/RAG-RewardBench/上公开发布了我们的基准和代码,供未来工作使用。
English
Despite the significant progress made by existing retrieval augmented
language models (RALMs) in providing trustworthy responses and grounding in
reliable sources, they often overlook effective alignment with human
preferences. In the alignment process, reward models (RMs) act as a crucial
proxy for human values to guide optimization. However, it remains unclear how
to evaluate and select a reliable RM for preference alignment in RALMs. To this
end, we propose RAG-RewardBench, the first benchmark for evaluating RMs in RAG
settings. First, we design four crucial and challenging RAG-specific scenarios
to assess RMs, including multi-hop reasoning, fine-grained citation,
appropriate abstain, and conflict robustness. Then, we incorporate 18 RAG
subsets, six retrievers, and 24 RALMs to increase the diversity of data
sources. Finally, we adopt an LLM-as-a-judge approach to improve preference
annotation efficiency and effectiveness, exhibiting a strong correlation with
human annotations. Based on the RAG-RewardBench, we conduct a comprehensive
evaluation of 45 RMs and uncover their limitations in RAG scenarios.
Additionally, we also reveal that existing trained RALMs show almost no
improvement in preference alignment, highlighting the need for a shift towards
preference-aligned training.We release our benchmark and code publicly at
https://huggingface.co/datasets/jinzhuoran/RAG-RewardBench/ for future work.Summary
AI-Generated Summary