トランスフォーマーにおける適応的ダイナミクスのためのグラフ意識同型アテンション

要旨

我々は、Transformerアーキテクチャを変更するアプローチを提案し、グラフに関する関係推論を注意機構に統合することで、グラフニューラルネットワークと言語モデリングの概念を融合させました。注意とグラフ理論との固有の関連性に基づき、Transformerの注意機構をグラフ演算として再定式化し、Graph-Aware Isomorphic Attentionを提案します。この手法は、Graph Isomorphism Networks（GIN）やPrincipal Neighborhood Aggregation（PNA）などの高度なグラフモデリング戦略を活用し、関係構造の表現を豊かにします。我々のアプローチは、一貫性のある依存関係を捉え、一般化を促進し、一般化ギャップを縮小し、学習パフォーマンスを向上させることが示されています。さらに、我々はグラフに意識した注意を拡張し、Sparse GIN-Attentionを導入しました。この微調整手法は、疎なGINを使用します。注意行列を疎な隣接グラフとして解釈することで、この手法は、事前学習済みの基盤モデルの適応性を向上させ、計算コストを最小限に抑えつつ、グラフに意識した機能を付与します。Sparse GIN-Attentionの微調整により、低ランク適応（LoRA）などの代替手法と比較して、改善されたトレーニングダイナミクスとより良い一般化が実現されます。我々は、伝統的な注意機構内の潜在的なグラフ構造について議論し、Transformerを関係推論のための階層的GINモデルとして進化させることで、新たな理解の観点を提供します。この視点は、基盤モデルの開発に深い影響を与え、局所的およびグローバルな依存関係の両方に動的に適応するアーキテクチャの設計を可能にします。バイオインフォマティクス、材料科学、言語モデリングなどの分野において、関係と連続データモデリングの統合により、解釈可能で一般化可能なモデリング戦略の舞台が整います。

English

We present an approach to modifying Transformer architectures by integrating graph-aware relational reasoning into the attention mechanism, merging concepts from graph neural networks and language modeling. Building on the inherent connection between attention and graph theory, we reformulate the Transformer's attention mechanism as a graph operation and propose Graph-Aware Isomorphic Attention. This method leverages advanced graph modeling strategies, including Graph Isomorphism Networks (GIN) and Principal Neighborhood Aggregation (PNA), to enrich the representation of relational structures. Our approach captures complex dependencies and generalizes across tasks, as evidenced by a reduced generalization gap and improved learning performance. Additionally, we expand the concept of graph-aware attention to introduce Sparse GIN-Attention, a fine-tuning approach that employs sparse GINs. By interpreting attention matrices as sparse adjacency graphs, this technique enhances the adaptability of pre-trained foundational models with minimal computational overhead, endowing them with graph-aware capabilities. Sparse GIN-Attention fine-tuning achieves improved training dynamics and better generalization compared to alternative methods like low-rank adaption (LoRA). We discuss latent graph-like structures within traditional attention mechanisms, offering a new lens through which Transformers can be understood. By evolving Transformers as hierarchical GIN models for relational reasoning. This perspective suggests profound implications for foundational model development, enabling the design of architectures that dynamically adapt to both local and global dependencies. Applications in bioinformatics, materials science, language modeling, and beyond could benefit from this synthesis of relational and sequential data modeling, setting the stage for interpretable and generalizable modeling strategies.

トランスフォーマーにおける適応的ダイナミクスのためのグラフ意識同型アテンション

Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers

要旨

Summary

Support