新数据如何渗透大语言模型知识体系及其稀释方法

摘要

大型语言模型通过基于梯度的更新不断学习，但新信息的个别片段如何影响现有知识，导致有益的泛化和有问题的幻觉，这一机制仍鲜为人知。我们证明，在学习新信息时，LLMs表现出“启动”效应：学习一个新事实可能导致模型在不相关的情境中不适当地应用该知识。为了系统研究这一现象，我们引入了“Outlandish”，一个精心策划的1320个多样化文本样本数据集，旨在探究新知识如何渗透到LLM的现有知识库中。利用该数据集，我们展示了学习新信息后的启动程度可以通过测量学习前关键词的标记概率来预测。这一关系在不同模型架构（PALM-2、Gemma、Llama）、大小和训练阶段中均稳健成立。最后，我们开发了两种新技术来调节新知识对现有模型行为的影响：（1）“垫脚石”文本增强策略和（2）“忽略-k”更新修剪方法。这些方法将不良启动效应减少了50-95%，同时保持了模型学习新信息的能力。我们的发现不仅为LLMs的学习机制提供了实证见解，还为提升语言模型知识插入的精确性提供了实用工具。更多材料请访问：https://sunchipsster1.github.io/projects/outlandish/

English

Large language models learn and continually learn through the accumulation of gradient-based updates, but how individual pieces of new information affect existing knowledge, leading to both beneficial generalization and problematic hallucination, remains poorly understood. We demonstrate that when learning new information, LLMs exhibit a "priming" effect: learning a new fact can cause the model to inappropriately apply that knowledge in unrelated contexts. To systematically study this phenomenon, we introduce "Outlandish," a carefully curated dataset of 1320 diverse text samples designed to probe how new knowledge permeates through an LLM's existing knowledge base. Using this dataset, we show that the degree of priming after learning new information can be predicted by measuring the token probability of key words before learning. This relationship holds robustly across different model architectures (PALM-2, Gemma, Llama), sizes, and training stages. Finally, we develop two novel techniques to modulate how new knowledge affects existing model behavior: (1) a ``stepping-stone'' text augmentation strategy and (2) an ``ignore-k'' update pruning method. These approaches reduce undesirable priming effects by 50-95\% while preserving the model's ability to learn new information. Our findings provide both empirical insights into how LLMs learn and practical tools for improving the specificity of knowledge insertion in language models. Further materials: https://sunchipsster1.github.io/projects/outlandish/

新数据如何渗透大语言模型知识体系及其稀释方法

How new data permeates LLM knowledge and how to dilute it

摘要

Summary

Support

Support