促进、抑制、迭代:语言模型如何应对一对多事实查询
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
February 27, 2025
作者: Tianyi Lorena Yan, Robin Jia
cs.AI
摘要
为应对一对多的事实查询(例如列举某国的城市),语言模型(LM)需同时实现知识回忆与避免重复先前答案。这两项子任务在内部是如何实现并整合的呢?通过多个数据集和模型的实验,我们发现了一种“先促进后抑制”的机制:模型首先回忆所有答案,随后抑制已生成的答案。具体而言,LM利用主题及先前答案的标记进行知识回忆,其中注意力机制传播主题信息,而多层感知器(MLPs)则促进答案的生成。接着,注意力机制关注并抑制先前答案的标记,同时MLPs放大这一抑制信号。我们的机制得到了广泛实验证据的支持:除了采用早期解码和因果追踪技术外,我们还通过引入“标记透镜”(Token Lens)——解码指定标记的聚合注意力更新——以及一种敲除方法——分析移除对指定标记的注意力后MLP输出的变化——来探究各组件如何利用不同标记。总体而言,我们为理解LM内部组件如何与不同输入标记互动以支持复杂事实回忆提供了新的洞见。代码已发布于https://github.com/Lorenayannnnn/how-lms-answer-one-to-many-factual-queries。
English
To answer one-to-many factual queries (e.g., listing cities of a country), a
language model (LM) must simultaneously recall knowledge and avoid repeating
previous answers. How are these two subtasks implemented and integrated
internally? Across multiple datasets and models, we identify a
promote-then-suppress mechanism: the model first recalls all answers, and then
suppresses previously generated ones. Specifically, LMs use both the subject
and previous answer tokens to perform knowledge recall, with attention
propagating subject information and MLPs promoting the answers. Then, attention
attends to and suppresses previous answer tokens, while MLPs amplify the
suppression signal. Our mechanism is corroborated by extensive experimental
evidence: in addition to using early decoding and causal tracing, we analyze
how components use different tokens by introducing both Token Lens, which
decodes aggregated attention updates from specified tokens, and a knockout
method that analyzes changes in MLP outputs after removing attention to
specified tokens. Overall, we provide new insights into how LMs' internal
components interact with different input tokens to support complex factual
recall. Code is available at
https://github.com/Lorenayannnnn/how-lms-answer-one-to-many-factual-queries.Summary
AI-Generated Summary