HALoGEN:奇幻LLM幻覺及其尋找之道

HALoGEN: Fantastic LLM Hallucinations and Where to Find Them

January 14, 2025
作者: Abhilasha Ravichander, Shrusti Ghela, David Wadden, Yejin Choi
cs.AI

摘要

儘管生成式大型語言模型(LLMs)具有生成高質量和流暢文本的能力,但也會產生幻覺:即與已建立的世界知識或提供的輸入上下文不一致的陳述。然而,測量幻覺可能具有挑戰性,因為讓人類即時驗證模型生成的成本高且耗時。在這項工作中,我們發布了HALoGEN,一個全面的幻覺基準,包括:(1)10,923個用於生成模型的提示,涵蓋九個領域,包括編程、科學歸因和摘要,以及(2)每個用例的自動高精度驗證器,將LLM生成拆分為原子單元,並對每個單元與高質量知識來源進行驗證。我們使用這個框架來評估來自14個語言模型的約150,000個生成,發現即使是表現最佳的模型也充斥著幻覺(有時根據領域,生成的原子事實中高達86%可能是幻覺)。我們進一步為LLM幻覺定義了一種新的錯誤分類,基於它們是否可能源於對訓練數據的不正確回憶(A型錯誤)、訓練數據中的不正確知識(B型錯誤)或是捏造(C型錯誤)。我們希望我們的框架能夠為為什麼生成模型會產生幻覺的原則性研究奠定基礎,並推動可信任的大型語言模型的發展。
English
Despite their impressive ability to generate high-quality and fluent text, generative large language models (LLMs) also produce hallucinations: statements that are misaligned with established world knowledge or provided input context. However, measuring hallucination can be challenging, as having humans verify model generations on-the-fly is both expensive and time-consuming. In this work, we release HALoGEN, a comprehensive hallucination benchmark consisting of: (1) 10,923 prompts for generative models spanning nine domains including programming, scientific attribution, and summarization, and (2) automatic high-precision verifiers for each use case that decompose LLM generations into atomic units, and verify each unit against a high-quality knowledge source. We use this framework to evaluate ~150,000 generations from 14 language models, finding that even the best-performing models are riddled with hallucinations (sometimes up to 86% of generated atomic facts depending on the domain). We further define a novel error classification for LLM hallucinations based on whether they likely stem from incorrect recollection of training data (Type A errors), or incorrect knowledge in training data (Type B errors), or are fabrication (Type C errors). We hope our framework provides a foundation to enable the principled study of why generative models hallucinate, and advances the development of trustworthy large language models.

Summary

AI-Generated Summary

PDF162January 15, 2025