LLM이 새로운 지식을 어떻게 습득하는가? 지식 회로 관점에서의 지속적 사전 훈련

초록

지식 집약적 작업에서 뛰어난 능력을 갖고 있음에도 불구하고 대규모 언어 모델(LLMs)은 어떻게 새로운 지식을 내재화하는지, 특히 획득한 지식을 신경 계산에 구조적으로 어떻게 포함시키는지에 대한 중요한 간극을 겪고 있습니다. 우리는 지식 회로 진화의 시각을 통해 이 문제에 대처하며, 지식 저장 및 처리를 용이하게 하는 계산 서브그래프를 식별합니다. 우리의 지속적 사전 훈련을 통한 회로 진화의 체계적 분석은 여러 가지 주요 발견을 드러냅니다: (1) 새로운 지식의 획득은 기존 지식과의 관련성에 영향을 받습니다; (2) 지식 회로의 진화는 형성에서 최적화로의 명확한 위상 변이를 보입니다; (3) 지식 회로의 진화는 깊은 것에서 얕은 것으로의 패턴을 따릅니다. 이러한 통찰은 LLMs에서 새로운 지식 획득 메커니즘에 대한 이론적 이해를 발전시킬 뿐만 아니라, 지속적 사전 훈련 전략을 개선하여 모델 성능을 향상시키는 잠재적 함의도 제공합니다. 코드와 데이터는 https://github.com/zjunlp/DynamicKnowledgeCircuits에서 제공될 예정입니다.

English

Despite exceptional capabilities in knowledge-intensive tasks, Large Language Models (LLMs) face a critical gap in understanding how they internalize new knowledge, particularly how to structurally embed acquired knowledge in their neural computations. We address this issue through the lens of knowledge circuit evolution, identifying computational subgraphs that facilitate knowledge storage and processing. Our systematic analysis of circuit evolution throughout continual pre-training reveals several key findings: (1) the acquisition of new knowledge is influenced by its relevance to pre-existing knowledge; (2) the evolution of knowledge circuits exhibits a distinct phase shift from formation to optimization; (3) the evolution of knowledge circuits follows a deep-to-shallow pattern. These insights not only advance our theoretical understanding of the mechanisms of new knowledge acquisition in LLMs, but also provide potential implications for improving continual pre-training strategies to enhance model performance. Code and data will be available at https://github.com/zjunlp/DynamicKnowledgeCircuits.

LLM이 새로운 지식을 어떻게 습득하는가? 지식 회로 관점에서의 지속적 사전 훈련

How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

초록

Summary

Support