コンテキストに基づく同変位置エンコーディングを介した言語モデルにおけるアドレッシングの再考

要旨

トランスフォーマーは、予測を行うためにコンテンツベースと位置ベースのアドレッシングメカニズムの両方に依存していますが、既存の位置符号化技術はしばしば位置ベースのアドレッシングの効果を低下させます。多くの現行手法は、注意マップに厳格なパターンを強制し、長距離依存関係をモデル化する能力を制限し、さまざまなタスクに適応する能力を制限しています。さらに、ほとんどの位置符号化は一般的なバイアスとして学習されており、データセット内の異なるインスタンスに必要な特殊化が欠けています。この問題に対処するために、我々はコンテキストに依存した同変位置埋め込み（TAPE）という新しいフレームワークを提案します。TAPEは、シーケンスのコンテンツをレイヤー全体にわたって組み込むことで位置埋め込みを強化します。TAPEは、動的でコンテキストに敏感な位置符号化を導入し、従来の固定パターンの制約を克服します。順列と直交同変性を強制することで、TAPEは位置符号化の安定性を保ち、更新中にロバスト性と適応性を向上させます。我々の手法は、事前学習されたトランスフォーマーに簡単に統合でき、最小限のオーバーヘッドでパラメータ効率の良いファインチューニングを提供します。広範な実験により、TAPEが既存の位置埋め込み技術と比較して言語モデリング、算術推論、および長いコンテキストの検索タスクで優れたパフォーマンスを達成することが示されています。

English

Transformers rely on both content-based and position-based addressing mechanisms to make predictions, but existing positional encoding techniques often diminish the effectiveness of position-based addressing. Many current methods enforce rigid patterns in attention maps, limiting the ability to model long-range dependencies and adapt to diverse tasks. Additionally, most positional encodings are learned as general biases, lacking the specialization required for different instances within a dataset. To address this, we propose conTextualized equivariAnt Position Embedding (TAPE), a novel framework that enhances positional embeddings by incorporating sequence content across layers. TAPE introduces dynamic, context-aware positional encodings, overcoming the constraints of traditional fixed patterns. By enforcing permutation and orthogonal equivariance, TAPE ensures the stability of positional encodings during updates, improving robustness and adaptability. Our method can be easily integrated into pre-trained transformers, offering parameter-efficient fine-tuning with minimal overhead. Extensive experiments shows that TAPE achieves superior performance in language modeling, arithmetic reasoning, and long-context retrieval tasks compared to existing positional embedding techniques.

コンテキストに基づく同変位置エンコーディングを介した言語モデルにおけるアドレッシングの再考

Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding

要旨

Support