ChatPaper.aiChatPaper

解讀和編輯視覺語言表示以減輕幻覺

Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations

October 3, 2024
作者: Nick Jiang, Anish Kachinthaya, Suzie Petryk, Yossi Gandelsman
cs.AI

摘要

我們研究視覺語言模型(VLMs)的內部表示,以應對幻覺問題,儘管模型規模和訓練取得進展,但幻覺仍是一個持久的挑戰。我們將VLMs的內部圖像表示投影到它們的語言詞彙,觀察到對於真實物體,輸出概率比對於幻覺物體更有信心。此外,我們使用這些輸出概率來空間定位真實物體。基於這種方法,我們引入了一種知識消除算法,通過對圖像特徵與幻覺物體特徵進行線性正交化,從而消除幻覺。我們展示了對模型潛在表示進行有針對性編輯可以在COCO2014數據集上將幻覺減少多達25.7%,同時保持性能。我們的研究結果顯示,對VLMs的潛在表示有更深入的理解可以增強可靠性,並實現新的功能,如零樣本分割。
English
We investigate the internal representations of vision-language models (VLMs) to address hallucinations, a persistent challenge despite advances in model size and training. We project VLMs' internal image representations to their language vocabulary and observe more confident output probabilities on real objects than hallucinated objects. We additionally use these output probabilities to spatially localize real objects. Building on this approach, we introduce a knowledge erasure algorithm that removes hallucinations by linearly orthogonalizing image features with respect to hallucinated object features. We show that targeted edits to a model's latent representations can reduce hallucinations by up to 25.7% on the COCO2014 dataset while preserving performance. Our findings demonstrate how a deeper understanding of VLMs' latent representations can enhance reliability and enable novel capabilities, such as zero-shot segmentation.

Summary

AI-Generated Summary

PDF92November 16, 2024