ChatPaper.aiChatPaper

REFIND:大语言模型中的检索增强型事实性幻觉检测

REFIND: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models

February 19, 2025
作者: DongGeon Lee, Hwanjo Yu
cs.AI

摘要

大型语言模型(LLM)输出中的幻觉现象严重限制了其在知识密集型任务(如问答)中的可靠性。为应对这一挑战,我们提出了REFIND(检索增强的事实性幻觉检测),这是一种新颖的框架,通过直接利用检索到的文档来检测LLM输出中的幻觉片段。作为REFIND的一部分,我们引入了上下文敏感度比率(CSR),这一新指标量化了LLM输出对检索证据的敏感程度。这一创新方法使REFIND能够高效且准确地检测幻觉,从而与现有方法区分开来。在评估中,REFIND在包括低资源环境在内的九种语言中展现了鲁棒性,并显著超越了基线模型,在识别幻觉片段方面获得了更高的IoU分数。本研究强调了量化上下文敏感度在幻觉检测中的有效性,从而为跨多种语言开发更可靠、更值得信赖的LLM应用铺平了道路。
English
Hallucinations in large language model (LLM) outputs severely limit their reliability in knowledge-intensive tasks such as question answering. To address this challenge, we introduce REFIND (Retrieval-augmented Factuality hallucINation Detection), a novel framework that detects hallucinated spans within LLM outputs by directly leveraging retrieved documents. As part of the REFIND, we propose the Context Sensitivity Ratio (CSR), a novel metric that quantifies the sensitivity of LLM outputs to retrieved evidence. This innovative approach enables REFIND to efficiently and accurately detect hallucinations, setting it apart from existing methods. In the evaluation, REFIND demonstrated robustness across nine languages, including low-resource settings, and significantly outperformed baseline models, achieving superior IoU scores in identifying hallucinated spans. This work highlights the effectiveness of quantifying context sensitivity for hallucination detection, thereby paving the way for more reliable and trustworthy LLM applications across diverse languages.

Summary

AI-Generated Summary

PDF42February 20, 2025