图感知同构注意力：用于Transformer中自适应动态的方法

摘要

我们提出了一种修改Transformer架构的方法，通过将图感知关系推理整合到注意力机制中，融合了图神经网络和语言建模的概念。基于注意力和图论之间的固有联系，我们将Transformer的注意力机制重新表述为图操作，并提出了图感知同构注意力。该方法利用先进的图建模策略，包括图同构网络（GIN）和主邻域聚合（PNA），丰富了关系结构的表示。我们的方法捕捉了复杂的依赖关系，并在各种任务中实现了泛化，表现为减小的泛化差距和改进的学习性能。此外，我们将图感知注意力的概念扩展到引入稀疏GIN-Attention，这是一种利用稀疏GIN进行微调的方法。通过将注意力矩阵解释为稀疏邻接图，这种技术增强了预训练基础模型的适应性，同时带来了图感知能力，而且计算开销很小。与低秩适应（LoRA）等替代方法相比，稀疏GIN-Attention微调实现了改进的训练动态和更好的泛化性能。我们讨论了传统注意力机制中的潜在图结构，提供了一种新的理解Transformer的视角。通过将Transformer演变为用于关系推理的分层GIN模型。这种观点为基础模型的发展带来了深远的影响，使得可以设计动态适应本地和全局依赖关系的架构。生物信息学、材料科学、语言建模等领域的应用可以从关系和序列数据建模的综合中受益，为可解释和可泛化的建模策略奠定基础。

English

We present an approach to modifying Transformer architectures by integrating graph-aware relational reasoning into the attention mechanism, merging concepts from graph neural networks and language modeling. Building on the inherent connection between attention and graph theory, we reformulate the Transformer's attention mechanism as a graph operation and propose Graph-Aware Isomorphic Attention. This method leverages advanced graph modeling strategies, including Graph Isomorphism Networks (GIN) and Principal Neighborhood Aggregation (PNA), to enrich the representation of relational structures. Our approach captures complex dependencies and generalizes across tasks, as evidenced by a reduced generalization gap and improved learning performance. Additionally, we expand the concept of graph-aware attention to introduce Sparse GIN-Attention, a fine-tuning approach that employs sparse GINs. By interpreting attention matrices as sparse adjacency graphs, this technique enhances the adaptability of pre-trained foundational models with minimal computational overhead, endowing them with graph-aware capabilities. Sparse GIN-Attention fine-tuning achieves improved training dynamics and better generalization compared to alternative methods like low-rank adaption (LoRA). We discuss latent graph-like structures within traditional attention mechanisms, offering a new lens through which Transformers can be understood. By evolving Transformers as hierarchical GIN models for relational reasoning. This perspective suggests profound implications for foundational model development, enabling the design of architectures that dynamically adapt to both local and global dependencies. Applications in bioinformatics, materials science, language modeling, and beyond could benefit from this synthesis of relational and sequential data modeling, setting the stage for interpretable and generalizable modeling strategies.

图感知同构注意力：用于Transformer中自适应动态的方法

Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers

摘要

Summary

Support