在知識衝突下分析語言模型的殘餘流。
Analysing the Residual Stream of Language Models Under Knowledge Conflicts
October 21, 2024
作者: Yu Zhao, Xiaotang Du, Giwon Hong, Aryo Pradipta Gema, Alessio Devoto, Hongru Wang, Xuanli He, Kam-Fai Wong, Pasquale Minervini
cs.AI
摘要
大型語言模型(LLMs)可以在其參數中存儲大量的事實知識。然而,它們的參數知識可能與上下文提供的信息相衝突。這種衝突可能導致模型行為不良,例如依賴過時或不正確的信息。在這項工作中,我們研究LLMs是否能夠識別知識衝突,以及通過分析LLM的殘差流是否可能知道模型將依賴哪一來源的知識。通過探測任務,我們發現LLMs可以在殘差流中內部記錄知識衝突的信號,這可以通過探測中間模型激活來準確檢測。這使我們能夠在生成答案之前檢測到殘差流中的衝突,而無需修改輸入或模型參數。此外,我們發現當模型依賴上下文知識或參數知識來解決衝突時,殘差流顯示出顯著不同的模式。這種模式可以用來估計LLMs在發生衝突時的行為,並在生成答案之前防止意外答案的產生。我們的分析提供了有關LLMs如何內部管理知識衝突的見解,並為開發控制知識選擇過程的方法奠定了基礎。
English
Large language models (LLMs) can store a significant amount of factual
knowledge in their parameters. However, their parametric knowledge may conflict
with the information provided in the context. Such conflicts can lead to
undesirable model behaviour, such as reliance on outdated or incorrect
information. In this work, we investigate whether LLMs can identify knowledge
conflicts and whether it is possible to know which source of knowledge the
model will rely on by analysing the residual stream of the LLM. Through probing
tasks, we find that LLMs can internally register the signal of knowledge
conflict in the residual stream, which can be accurately detected by probing
the intermediate model activations. This allows us to detect conflicts within
the residual stream before generating the answers without modifying the input
or model parameters. Moreover, we find that the residual stream shows
significantly different patterns when the model relies on contextual knowledge
versus parametric knowledge to resolve conflicts. This pattern can be employed
to estimate the behaviour of LLMs when conflict happens and prevent unexpected
answers before producing the answers. Our analysis offers insights into how
LLMs internally manage knowledge conflicts and provides a foundation for
developing methods to control the knowledge selection processes.Summary
AI-Generated Summary