LM의 구성적 일반화와 환영에서의 선형 상관관계

초록

언어 모델(LMs)의 일반화는 일반 지능의 잠재력과 기본 지식 구성(예: 역/전이 저주)과의 논쟁이 활발히 진행 중입니다. 본 논문은 지식 구성 중 LMs 내 선형 상관 관계 현상을 밝혀냅니다. 설명을 위해, 특정 관련 지식 사이에 존재하는 선형 변환은 다음 토큰 예측 로짓을 한 프롬프트에서 다른 프롬프트로 매핑합니다. 예를 들어, "X는 도시에 살고 있다" → "X는 나라에 살고 있다"와 같이 주어진 X에 대해 선형성이 나타납니다. 이는 파리 → 프랑스와 같이 인간 지식 구성에서의 선형성을 반영합니다. 우리의 연구 결과는 대규모 세밀 조정에도 선형 변환은 현실 세계의 관계와 일치할 때 최신화된 지식을 일반화하지만, 벗어날 경우 환각을 유발한다는 것을 나타냅니다. 경험적 결과는 선형 상관 관계가 LM의 일반화의 잠재적 식별자로 작용할 수 있다는 것을 시사합니다. 마지막으로, 이러한 선형 상관 관계는 단일 피드포워드 네트워크와 사전 훈련된 어휘 표현을 통해 학습될 수 있으며, 이는 LM의 일반화가 후자에 크게 의존함을 나타냅니다.

English

The generalization of language models (LMs) is undergoing active debates, contrasting their potential for general intelligence with their struggles with basic knowledge composition (e.g., reverse/transition curse). This paper uncovers the phenomenon of linear correlations in LMs during knowledge composition. For explanation, there exists a linear transformation between certain related knowledge that maps the next token prediction logits from one prompt to another, e.g., "X lives in the city of" rightarrow "X lives in the country of" for every given X. This mirrors the linearity in human knowledge composition, such as Paris rightarrow France. Our findings indicate that the linear transformation is resilient to large-scale fine-tuning, generalizing updated knowledge when aligned with real-world relationships, but causing hallucinations when it deviates. Empirical results suggest that linear correlation can serve as a potential identifier of LM's generalization. Finally, we show such linear correlations can be learned with a single feedforward network and pre-trained vocabulary representations, indicating LM generalization heavily relies on the latter.

LM의 구성적 일반화와 환영에서의 선형 상관관계

Linear Correlation in LM's Compositional Generalization and Hallucination

초록

Summary

Support