抽象化の出現：トランスフォーマーにおけるコンセプト符号化および復号メカニズムによる文脈学習

要旨

人間は、複雑な経験を基本的な抽象化にまとめて、迅速な学習と適応を可能にします。同様に、自己回帰トランスフォーマーは、文脈学習を通じて適応的学習を示し、そのメカニズムについて問いかけます。本論文では、概念符号化・復号メカニズムを提案し、トランスフォーマーが内部の抽象化を形成し利用する過程を調査することで、文脈学習を説明します。合成的な文脈学習タスクにおいて、小規模なトランスフォーマーの訓練ダイナミクスを分析し、概念符号化と復号の相互作用が報告されました。モデルが異なる潜在的な概念（例：「文中の最初の名詞を見つける」）を異なる、分離可能な表現に符号化する方法を学習するにつれて、条件付き復号アルゴリズムを構築し、文脈学習の性能を向上させていきます。我々は、さまざまなスケールの事前学習モデル（Gemma-2 2B/9B/27B、Llama-3.1 8B/70B）でこのメカニズムの存在を検証します。さらに、メカニズム的介入と制御されたファインチューニングを通じて、概念符号化の質が文脈学習の性能と因果関係があり、予測可能であることを示します。我々の経験的知見は、大規模言語モデルの表現を通じて、その成功と失敗モードをよりよく理解する手助けとなります。

English

Humans distill complex experiences into fundamental abstractions that enable rapid learning and adaptation. Similarly, autoregressive transformers exhibit adaptive learning through in-context learning (ICL), which begs the question of how. In this paper, we propose concept encoding-decoding mechanism to explain ICL by studying how transformers form and use internal abstractions in their representations. On synthetic ICL tasks, we analyze the training dynamics of a small transformer and report the coupled emergence of concept encoding and decoding. As the model learns to encode different latent concepts (e.g., ``Finding the first noun in a sentence.") into distinct, separable representations, it concureently builds conditional decoding algorithms and improve its ICL performance. We validate the existence of this mechanism across pretrained models of varying scales (Gemma-2 2B/9B/27B, Llama-3.1 8B/70B). Further, through mechanistic interventions and controlled finetuning, we demonstrate that the quality of concept encoding is causally related and predictive of ICL performance. Our empirical insights shed light into better understanding the success and failure modes of large language models via their representations.

抽象化の出現：トランスフォーマーにおけるコンセプト符号化および復号メカニズムによる文脈学習

Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers

要旨

Summary

Support

Support