CORAL: 다중 턴 대화 검색 보강 생성의 벤치마킹

초록

검색 보강 생성(Retrieval-Augmented Generation, RAG)은 외부 지식 검색을 통해 대형 언어 모델(Large Language Models, LLMs)을 강화하는 강력한 패러다임이 되었습니다. 널리 주목받고 있지만 기존의 학술 연구는 주로 단일 턴 RAG에 초점을 맞추고 있어 현실 세계 응용 프로그램에서 발견되는 다중 턴 대화의 복잡성을 다루는 데 중요한 공백이 있습니다. 이 공백을 메우기 위해 우리는 실제적인 다중 턴 대화 환경에서 RAG 시스템을 평가하기 위해 설계된 대규모 벤치마크인 CORAL을 소개합니다. CORAL은 위키피디아에서 자동으로 유래된 다양한 정보 탐색 대화를 포함하며, 오픈 도메인 커버리지, 지식 집중도, 자유 형식 응답 및 주제 변경과 같은 주요 도전 과제를 다룹니다. 이는 대화형 RAG의 세 가지 핵심 작업인 단락 검색, 응답 생성 및 인용 레이블링을 지원합니다. 우리는 다양한 대화형 RAG 방법을 표준화하기 위한 통합된 프레임워크를 제안하고, 이러한 방법들을 CORAL에서 포괄적으로 평가하여 기존 방법을 개선할 수 있는 상당한 기회를 보여줍니다.

English

Retrieval-Augmented Generation (RAG) has become a powerful paradigm for enhancing large language models (LLMs) through external knowledge retrieval. Despite its widespread attention, existing academic research predominantly focuses on single-turn RAG, leaving a significant gap in addressing the complexities of multi-turn conversations found in real-world applications. To bridge this gap, we introduce CORAL, a large-scale benchmark designed to assess RAG systems in realistic multi-turn conversational settings. CORAL includes diverse information-seeking conversations automatically derived from Wikipedia and tackles key challenges such as open-domain coverage, knowledge intensity, free-form responses, and topic shifts. It supports three core tasks of conversational RAG: passage retrieval, response generation, and citation labeling. We propose a unified framework to standardize various conversational RAG methods and conduct a comprehensive evaluation of these methods on CORAL, demonstrating substantial opportunities for improving existing approaches.

CORAL: 다중 턴 대화 검색 보강 생성의 벤치마킹

CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation

초록

Support