ChatPaper.aiChatPaper

CORAL:多轮对话检索增强生成的基准测试

CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation

October 30, 2024
作者: Yiruo Cheng, Kelong Mao, Ziliang Zhao, Guanting Dong, Hongjin Qian, Yongkang Wu, Tetsuya Sakai, Ji-Rong Wen, Zhicheng Dou
cs.AI

摘要

检索增强生成(Retrieval-Augmented Generation,RAG)已成为通过外部知识检索增强大型语言模型(Large Language Models,LLMs)的强大范式。尽管受到广泛关注,但现有的学术研究主要集中在单轮RAG上,未能有效解决现实应用中多轮对话的复杂性,存在重大空白。为弥补这一空白,我们引入了CORAL,一个旨在评估RAG系统在真实多轮对话场景中的大规模基准。CORAL包括从维基百科自动衍生的多样信息获取对话,并解决了诸如开放域覆盖、知识密集度、自由形式回复和主题转换等关键挑战。它支持对话式RAG的三个核心任务:段落检索、回复生成和引文标记。我们提出了一个统一框架,以标准化各种对话式RAG方法,并在CORAL上对这些方法进行全面评估,展示了改进现有方法的重大机会。
English
Retrieval-Augmented Generation (RAG) has become a powerful paradigm for enhancing large language models (LLMs) through external knowledge retrieval. Despite its widespread attention, existing academic research predominantly focuses on single-turn RAG, leaving a significant gap in addressing the complexities of multi-turn conversations found in real-world applications. To bridge this gap, we introduce CORAL, a large-scale benchmark designed to assess RAG systems in realistic multi-turn conversational settings. CORAL includes diverse information-seeking conversations automatically derived from Wikipedia and tackles key challenges such as open-domain coverage, knowledge intensity, free-form responses, and topic shifts. It supports three core tasks of conversational RAG: passage retrieval, response generation, and citation labeling. We propose a unified framework to standardize various conversational RAG methods and conduct a comprehensive evaluation of these methods on CORAL, demonstrating substantial opportunities for improving existing approaches.

Summary

AI-Generated Summary

PDF563November 16, 2024