BordIRlines:用於評估跨語言檢索增強生成的數據集
BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation
October 2, 2024
作者: Bryan Li, Samar Haider, Fiona Luo, Adwait Agashe, Chris Callison-Burch
cs.AI
摘要
大型語言模型擅長於創造性生成,但仍然在幻覺和偏見問題上遇到困難。檢索增強生成(RAG)提供了一個框架,可以將大型語言模型(LLMs)的回應基於準確和最新資訊,但仍然引發了偏見問題:應該選擇哪些來源來包含在上下文中?它們的重要性應該如何加權?在本文中,我們研究了跨語言RAG的挑戰,並提出了一個數據集,以研究現有系統在回答有關地緣政治爭端的查詢時的韌性,這些爭端存在於語言、文化和政治界線的交叉點。我們的數據集來自包含與給定查詢相關信息的維基百科頁面,我們研究了包含額外上下文的影響,以及這種上下文在語言和來源方面的組成對LLM回應的影響。我們的結果顯示,現有的RAG系統在跨語言使用案例上仍然面臨挑戰,當它們被提供多種語言的競爭性信息時,缺乏一致性。我們提出案例研究來說明這些問題,並概述未來研究應採取的步驟來應對這些挑戰。我們將我們的數據集和代碼公開提供在https://github.com/manestay/bordIRlines。
English
Large language models excel at creative generation but continue to struggle
with the issues of hallucination and bias. While retrieval-augmented generation
(RAG) provides a framework for grounding LLMs' responses in accurate and
up-to-date information, it still raises the question of bias: which sources
should be selected for inclusion in the context? And how should their
importance be weighted? In this paper, we study the challenge of cross-lingual
RAG and present a dataset to investigate the robustness of existing systems at
answering queries about geopolitical disputes, which exist at the intersection
of linguistic, cultural, and political boundaries. Our dataset is sourced from
Wikipedia pages containing information relevant to the given queries and we
investigate the impact of including additional context, as well as the
composition of this context in terms of language and source, on an LLM's
response. Our results show that existing RAG systems continue to be challenged
by cross-lingual use cases and suffer from a lack of consistency when they are
provided with competing information in multiple languages. We present case
studies to illustrate these issues and outline steps for future research to
address these challenges. We make our dataset and code publicly available at
https://github.com/manestay/bordIRlines.Summary
AI-Generated Summary