促进、抑制、迭代：语言模型如何应对一对多事实查询

摘要

为应对一对多的事实查询（例如列举某国的城市），语言模型（LM）需同时实现知识回忆与避免重复先前答案。这两项子任务在内部是如何实现并整合的呢？通过多个数据集和模型的实验，我们发现了一种“先促进后抑制”的机制：模型首先回忆所有答案，随后抑制已生成的答案。具体而言，LM利用主题及先前答案的标记进行知识回忆，其中注意力机制传播主题信息，而多层感知器（MLPs）则促进答案的生成。接着，注意力机制关注并抑制先前答案的标记，同时MLPs放大这一抑制信号。我们的机制得到了广泛实验证据的支持：除了采用早期解码和因果追踪技术外，我们还通过引入“标记透镜”（Token Lens）——解码指定标记的聚合注意力更新——以及一种敲除方法——分析移除对指定标记的注意力后MLP输出的变化——来探究各组件如何利用不同标记。总体而言，我们为理解LM内部组件如何与不同输入标记互动以支持复杂事实回忆提供了新的洞见。代码已发布于https://github.com/Lorenayannnnn/how-lms-answer-one-to-many-factual-queries。

English

To answer one-to-many factual queries (e.g., listing cities of a country), a language model (LM) must simultaneously recall knowledge and avoid repeating previous answers. How are these two subtasks implemented and integrated internally? Across multiple datasets and models, we identify a promote-then-suppress mechanism: the model first recalls all answers, and then suppresses previously generated ones. Specifically, LMs use both the subject and previous answer tokens to perform knowledge recall, with attention propagating subject information and MLPs promoting the answers. Then, attention attends to and suppresses previous answer tokens, while MLPs amplify the suppression signal. Our mechanism is corroborated by extensive experimental evidence: in addition to using early decoding and causal tracing, we analyze how components use different tokens by introducing both Token Lens, which decodes aggregated attention updates from specified tokens, and a knockout method that analyzes changes in MLP outputs after removing attention to specified tokens. Overall, we provide new insights into how LMs' internal components interact with different input tokens to support complex factual recall. Code is available at https://github.com/Lorenayannnnn/how-lms-answer-one-to-many-factual-queries.

促进、抑制、迭代：语言模型如何应对一对多事实查询

Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries

摘要

Summary

Support

Support