TruthPrInt:通过潜在真实性引导的预干预缓解LVLM对象幻觉
TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention
March 13, 2025
作者: Jinhao Duan, Fei Kong, Hao Cheng, James Diffenderfer, Bhavya Kailkhura, Lichao Sun, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu
cs.AI
摘要
物体幻觉(Object Hallucination, OH)已被公认为大规模视觉语言模型(Large Vision-Language Models, LVLMs)面临的主要可信挑战之一。近期大型语言模型(Large Language Models, LLMs)的进展表明,内部状态(如隐藏状态)编码了生成响应的“整体真实性”。然而,关于LVLMs内部状态如何运作,以及它们是否能够作为“逐令牌”幻觉指示器,这一关键问题仍待深入探索,这对于缓解OH至关重要。本文中,我们首先对LVLM内部状态与OH问题之间的关系进行了深入探究,发现:(1)LVLM内部状态是幻觉行为的高特异性逐令牌指示器。此外,(2)不同的LVLMs在共同的潜在子空间中编码了幻觉的普遍模式,表明存在多种LVLMs共享的“通用真实方向”。基于这些发现,我们提出了真实导向预干预(Truthful-Guided Pre-Intervention, TruthPrInt),该方法首先学习LVLM解码的真实方向,然后在LVLM解码过程中应用真实导向的推理时干预。为进一步提升跨LVLM和跨数据幻觉检测的迁移能力,我们提出了ComnHallu,通过构建和对齐幻觉潜在子空间来实现这一目标。我们在广泛的实验环境中评估了TruthPrInt,包括域内和域外场景,覆盖了流行的LVLMs和OH基准测试。实验结果表明,TruthPrInt显著优于现有最先进的方法。代码将在https://github.com/jinhaoduan/TruthPrInt 公开。
English
Object Hallucination (OH) has been acknowledged as one of the major
trustworthy challenges in Large Vision-Language Models (LVLMs). Recent
advancements in Large Language Models (LLMs) indicate that internal states,
such as hidden states, encode the "overall truthfulness" of generated
responses. However, it remains under-explored how internal states in LVLMs
function and whether they could serve as "per-token" hallucination indicators,
which is essential for mitigating OH. In this paper, we first conduct an
in-depth exploration of LVLM internal states in relation to OH issues and
discover that (1) LVLM internal states are high-specificity per-token
indicators of hallucination behaviors. Moreover, (2) different LVLMs encode
universal patterns of hallucinations in common latent subspaces, indicating
that there exist "generic truthful directions" shared by various LVLMs. Based
on these discoveries, we propose Truthful-Guided Pre-Intervention (TruthPrInt)
that first learns the truthful direction of LVLM decoding and then applies
truthful-guided inference-time intervention during LVLM decoding. We further
propose ComnHallu to enhance both cross-LVLM and cross-data hallucination
detection transferability by constructing and aligning hallucination latent
subspaces. We evaluate TruthPrInt in extensive experimental settings, including
in-domain and out-of-domain scenarios, over popular LVLMs and OH benchmarks.
Experimental results indicate that TruthPrInt significantly outperforms
state-of-the-art methods. Codes will be available at
https://github.com/jinhaoduan/TruthPrInt.Summary
AI-Generated Summary