在知识冲突下分析语言模型的残余流。
Analysing the Residual Stream of Language Models Under Knowledge Conflicts
October 21, 2024
作者: Yu Zhao, Xiaotang Du, Giwon Hong, Aryo Pradipta Gema, Alessio Devoto, Hongru Wang, Xuanli He, Kam-Fai Wong, Pasquale Minervini
cs.AI
摘要
大型语言模型(LLMs)可以在其参数中存储大量事实知识。然而,它们的参数化知识可能与上下文提供的信息相冲突。这种冲突可能导致模型行为不佳,例如依赖过时或不正确的信息。在这项工作中,我们调查了LLMs是否能够识别知识冲突,以及通过分析LLM的残差流是否可能知道模型将依赖哪种知识源。通过探测任务,我们发现LLMs可以在残差流中内部注册知识冲突的信号,这可以通过探测中间模型激活来准确检测。这使我们能够在生成答案之前检测到残差流中的冲突,而无需修改输入或模型参数。此外,我们发现当模型依赖上下文知识或参数化知识来解决冲突时,残差流显示出明显不同的模式。这种模式可以用来估计LLMs在发生冲突时的行为,并在生成答案之前防止出现意外答案。我们的分析揭示了LLMs如何内部管理知识冲突,并为开发控制知识选择过程的方法奠定了基础。
English
Large language models (LLMs) can store a significant amount of factual
knowledge in their parameters. However, their parametric knowledge may conflict
with the information provided in the context. Such conflicts can lead to
undesirable model behaviour, such as reliance on outdated or incorrect
information. In this work, we investigate whether LLMs can identify knowledge
conflicts and whether it is possible to know which source of knowledge the
model will rely on by analysing the residual stream of the LLM. Through probing
tasks, we find that LLMs can internally register the signal of knowledge
conflict in the residual stream, which can be accurately detected by probing
the intermediate model activations. This allows us to detect conflicts within
the residual stream before generating the answers without modifying the input
or model parameters. Moreover, we find that the residual stream shows
significantly different patterns when the model relies on contextual knowledge
versus parametric knowledge to resolve conflicts. This pattern can be employed
to estimate the behaviour of LLMs when conflict happens and prevent unexpected
answers before producing the answers. Our analysis offers insights into how
LLMs internally manage knowledge conflicts and provides a foundation for
developing methods to control the knowledge selection processes.Summary
AI-Generated Summary