LLM如何获取新知识?对持续预训练的知识回路视角
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training
February 16, 2025
作者: Yixin Ou, Yunzhi Yao, Ningyu Zhang, Hui Jin, Jiacheng Sun, Shumin Deng, Zhenguo Li, Huajun Chen
cs.AI
摘要
尽管大型语言模型(LLMs)在知识密集型任务中具有卓越的能力,但它们在理解其如何内化新知识,特别是如何在神经计算中结构化嵌入所获知识方面存在重要差距。我们通过知识电路演化的视角来解决这一问题,识别促进知识存储和处理的计算子图。我们对持续预训练期间电路演化的系统分析揭示了几个关键发现:(1)新知识的习得受其与现有知识的相关性影响;(2)知识电路的演化表现出从形成到优化的明显相位转变;(3)知识电路的演化呈现出深到浅的模式。这些见解不仅推进了我们对LLMs中新知识习得机制的理论理解,还为改进持续预训练策略以提高模型性能提供了潜在启示。代码和数据将在https://github.com/zjunlp/DynamicKnowledgeCircuits 上提供。
English
Despite exceptional capabilities in knowledge-intensive tasks, Large Language
Models (LLMs) face a critical gap in understanding how they internalize new
knowledge, particularly how to structurally embed acquired knowledge in their
neural computations. We address this issue through the lens of knowledge
circuit evolution, identifying computational subgraphs that facilitate
knowledge storage and processing. Our systematic analysis of circuit evolution
throughout continual pre-training reveals several key findings: (1) the
acquisition of new knowledge is influenced by its relevance to pre-existing
knowledge; (2) the evolution of knowledge circuits exhibits a distinct phase
shift from formation to optimization; (3) the evolution of knowledge circuits
follows a deep-to-shallow pattern. These insights not only advance our
theoretical understanding of the mechanisms of new knowledge acquisition in
LLMs, but also provide potential implications for improving continual
pre-training strategies to enhance model performance. Code and data will be
available at https://github.com/zjunlp/DynamicKnowledgeCircuits.Summary
AI-Generated Summary