ChatPaper.aiChatPaper

LettuceDetect:面向RAG应用的幻觉检测框架

LettuceDetect: A Hallucination Detection Framework for RAG Applications

February 24, 2025
作者: Ádám Kovács, Gábor Recski
cs.AI

摘要

尽管融入了外部知识源,检索增强生成(RAG)系统仍易产生虚构答案。我们提出LettuceDetect框架,旨在解决现有幻觉检测方法中的两大关键局限:(1)传统基于编码器方法受限于上下文窗口大小,(2)基于大语言模型(LLM)的方法计算效率低下。依托ModernBERT扩展上下文处理能力(可达8k tokens)并在RAGTruth基准数据集上训练,我们的方法超越了所有先前的基于编码器模型及多数基于提示的模型,同时模型规模仅为最佳模型的约三十分之一。LettuceDetect作为一种令牌分类模型,处理上下文-问题-答案三元组,能够在令牌级别识别无依据的断言。在RAGTruth语料库上的评估显示,实例级检测的F1分数达到79.22%,较之前基于编码器的最先进架构Luna提升了14.8%。此外,该系统在单GPU上每秒可处理30至60个实例,使其更适用于现实世界的RAG应用场景。
English
Retrieval Augmented Generation (RAG) systems remain vulnerable to hallucinated answers despite incorporating external knowledge sources. We present LettuceDetect a framework that addresses two critical limitations in existing hallucination detection methods: (1) the context window constraints of traditional encoder-based methods, and (2) the computational inefficiency of LLM based approaches. Building on ModernBERT's extended context capabilities (up to 8k tokens) and trained on the RAGTruth benchmark dataset, our approach outperforms all previous encoder-based models and most prompt-based models, while being approximately 30 times smaller than the best models. LettuceDetect is a token-classification model that processes context-question-answer triples, allowing for the identification of unsupported claims at the token level. Evaluations on the RAGTruth corpus demonstrate an F1 score of 79.22% for example-level detection, which is a 14.8% improvement over Luna, the previous state-of-the-art encoder-based architecture. Additionally, the system can process 30 to 60 examples per second on a single GPU, making it more practical for real-world RAG applications.

Summary

AI-Generated Summary

PDF82March 3, 2025