ChatPaper.aiChatPaper

LLM如何获取新知识?对持续预训练的知识回路视角

How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

February 16, 2025
作者: Yixin Ou, Yunzhi Yao, Ningyu Zhang, Hui Jin, Jiacheng Sun, Shumin Deng, Zhenguo Li, Huajun Chen
cs.AI

摘要

尽管大型语言模型(LLMs)在知识密集型任务中具有卓越的能力,但它们在理解其如何内化新知识,特别是如何在神经计算中结构化嵌入所获知识方面存在重要差距。我们通过知识电路演化的视角来解决这一问题,识别促进知识存储和处理的计算子图。我们对持续预训练期间电路演化的系统分析揭示了几个关键发现:(1)新知识的习得受其与现有知识的相关性影响;(2)知识电路的演化表现出从形成到优化的明显相位转变;(3)知识电路的演化呈现出深到浅的模式。这些见解不仅推进了我们对LLMs中新知识习得机制的理论理解,还为改进持续预训练策略以提高模型性能提供了潜在启示。代码和数据将在https://github.com/zjunlp/DynamicKnowledgeCircuits 上提供。
English
Despite exceptional capabilities in knowledge-intensive tasks, Large Language Models (LLMs) face a critical gap in understanding how they internalize new knowledge, particularly how to structurally embed acquired knowledge in their neural computations. We address this issue through the lens of knowledge circuit evolution, identifying computational subgraphs that facilitate knowledge storage and processing. Our systematic analysis of circuit evolution throughout continual pre-training reveals several key findings: (1) the acquisition of new knowledge is influenced by its relevance to pre-existing knowledge; (2) the evolution of knowledge circuits exhibits a distinct phase shift from formation to optimization; (3) the evolution of knowledge circuits follows a deep-to-shallow pattern. These insights not only advance our theoretical understanding of the mechanisms of new knowledge acquisition in LLMs, but also provide potential implications for improving continual pre-training strategies to enhance model performance. Code and data will be available at https://github.com/zjunlp/DynamicKnowledgeCircuits.

Summary

AI-Generated Summary

PDF226February 18, 2025