CORAL：多轮对话检索增强生成的基准测试

摘要

检索增强生成（Retrieval-Augmented Generation，RAG）已成为通过外部知识检索增强大型语言模型（Large Language Models，LLMs）的强大范式。尽管受到广泛关注，但现有的学术研究主要集中在单轮RAG上，未能有效解决现实应用中多轮对话的复杂性，存在重大空白。为弥补这一空白，我们引入了CORAL，一个旨在评估RAG系统在真实多轮对话场景中的大规模基准。CORAL包括从维基百科自动衍生的多样信息获取对话，并解决了诸如开放域覆盖、知识密集度、自由形式回复和主题转换等关键挑战。它支持对话式RAG的三个核心任务：段落检索、回复生成和引文标记。我们提出了一个统一框架，以标准化各种对话式RAG方法，并在CORAL上对这些方法进行全面评估，展示了改进现有方法的重大机会。

English

Retrieval-Augmented Generation (RAG) has become a powerful paradigm for enhancing large language models (LLMs) through external knowledge retrieval. Despite its widespread attention, existing academic research predominantly focuses on single-turn RAG, leaving a significant gap in addressing the complexities of multi-turn conversations found in real-world applications. To bridge this gap, we introduce CORAL, a large-scale benchmark designed to assess RAG systems in realistic multi-turn conversational settings. CORAL includes diverse information-seeking conversations automatically derived from Wikipedia and tackles key challenges such as open-domain coverage, knowledge intensity, free-form responses, and topic shifts. It supports three core tasks of conversational RAG: passage retrieval, response generation, and citation labeling. We propose a unified framework to standardize various conversational RAG methods and conduct a comprehensive evaluation of these methods on CORAL, demonstrating substantial opportunities for improving existing approaches.

CORAL：多轮对话检索增强生成的基准测试

CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation

摘要

Summary

Support

Support