大型多模型模型可以解釋大型多模型模型中的特徵
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
November 22, 2024
作者: Kaichen Zhang, Yifei Shen, Bo Li, Ziwei Liu
cs.AI
摘要
近期在大型多模型(LMMs)方面的進展在學術界和工業界都帶來了重大突破。一個引起關注的問題是,作為人類,我們如何理解它們內部的神經表示。本文通過提出一個多功能框架來識別和解釋LMMs中的語義,初步探討了這個問題。具體來說,1)我們首先應用了稀疏自編碼器(SAE)來將表示解開為人類可理解的特徵。2)然後,我們提出了一個自動解釋框架,通過LMMs自身來解釋SAE中學習到的開放語義特徵。我們利用這個框架來分析LLaVA-NeXT-8B模型,使用LLaVA-OV-72B模型,證明這些特徵能有效地引導模型的行為。我們的結果有助於更深入地理解為什麼LMMs在特定任務中表現出色,包括EQ測試,並闡明了它們的錯誤性質以及可能的糾正策略。這些發現為LMMs的內部機制提供了新的見解,並暗示了與人類大腦的認知過程之間的相似之處。
English
Recent advances in Large Multimodal Models (LMMs) lead to significant
breakthroughs in both academia and industry. One question that arises is how
we, as humans, can understand their internal neural representations. This paper
takes an initial step towards addressing this question by presenting a
versatile framework to identify and interpret the semantics within LMMs.
Specifically, 1) we first apply a Sparse Autoencoder(SAE) to disentangle the
representations into human understandable features. 2) We then present an
automatic interpretation framework to interpreted the open-semantic features
learned in SAE by the LMMs themselves. We employ this framework to analyze the
LLaVA-NeXT-8B model using the LLaVA-OV-72B model, demonstrating that these
features can effectively steer the model's behavior. Our results contribute to
a deeper understanding of why LMMs excel in specific tasks, including EQ tests,
and illuminate the nature of their mistakes along with potential strategies for
their rectification. These findings offer new insights into the internal
mechanisms of LMMs and suggest parallels with the cognitive processes of the
human brain.Summary
AI-Generated Summary