RAG-RewardBench: 선호 일치를 위한 검색 증강 생성에서 보상 모델을 벤치마킹하는 연구

초록

기존의 검색 보강 언어 모델(RALM)이 신뢰할 만한 응답과 신뢰할 수 있는 소스에 근거를 제공하는 데 어느 정도의 진전을 이루었지만, 인간의 선호도와 효과적으로 조율되는 것을 종종 간과합니다. 조율 과정에서 보상 모델(RM)은 최적화를 안내하기 위해 인간의 가치에 대한 중요한 대리자 역할을 합니다. 그러나 RALM에서 선호도 조율을 위해 신뢰할 수 있는 RM을 평가하고 선택하는 방법은 여전히 명확하지 않습니다. 이에 우리는 RAG-RewardBench를 제안합니다. 이는 RAG 환경에서 RM을 평가하기 위한 최초의 벤치마크입니다. 먼저, 다중 단계 추론, 세밀한 인용, 적절한 기권, 충돌 강건성을 포함한 네 가지 중요하고 도전적인 RAG 특정 시나리오를 설계하여 RM을 평가합니다. 그런 다음 데이터 소스의 다양성을 높이기 위해 18개의 RAG 하위 집합, 6개의 검색기 및 24개의 RALM을 통합합니다. 마지막으로 선호도 주석의 효율성과 효과성을 향상시키기 위해 판사로서의 LLM 접근 방식을 채택하여 인간 주석과 강한 상관 관계를 나타냅니다. RAG-RewardBench를 기반으로 45개의 RM을 종합적으로 평가하고 그들의 RAG 시나리오에서의 한계를 발견합니다. 또한 기존에 훈련된 RALM이 선호도 조율에서 거의 개선되지 않는다는 것을 밝혀, 선호도에 맞게 훈련하는 방향으로의 전환의 필요성을 강조합니다. 우리는 향후 작업을 위해 벤치마크와 코드를 https://huggingface.co/datasets/jinzhuoran/RAG-RewardBench/ 에 공개합니다.

English

Despite the significant progress made by existing retrieval augmented language models (RALMs) in providing trustworthy responses and grounding in reliable sources, they often overlook effective alignment with human preferences. In the alignment process, reward models (RMs) act as a crucial proxy for human values to guide optimization. However, it remains unclear how to evaluate and select a reliable RM for preference alignment in RALMs. To this end, we propose RAG-RewardBench, the first benchmark for evaluating RMs in RAG settings. First, we design four crucial and challenging RAG-specific scenarios to assess RMs, including multi-hop reasoning, fine-grained citation, appropriate abstain, and conflict robustness. Then, we incorporate 18 RAG subsets, six retrievers, and 24 RALMs to increase the diversity of data sources. Finally, we adopt an LLM-as-a-judge approach to improve preference annotation efficiency and effectiveness, exhibiting a strong correlation with human annotations. Based on the RAG-RewardBench, we conduct a comprehensive evaluation of 45 RMs and uncover their limitations in RAG scenarios. Additionally, we also reveal that existing trained RALMs show almost no improvement in preference alignment, highlighting the need for a shift towards preference-aligned training.We release our benchmark and code publicly at https://huggingface.co/datasets/jinzhuoran/RAG-RewardBench/ for future work.

RAG-RewardBench: 선호 일치를 위한 검색 증강 생성에서 보상 모델을 벤치마킹하는 연구

RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment

초록

Summary

Support

Support