트랜스포머의 적응적 동적을 위한 그래프 인식 이송 헤드

초록

Transformer 구조를 수정하는 접근 방식을 제시합니다. 이는 그래프 인식 관계 추론을 어텐션 메커니즘에 통합하여 그래프 신경망과 언어 모델링 개념을 통합한 것입니다. 어텐션과 그래프 이론 사이의 내재적 연결을 기반으로, Transformer의 어텐션 메커니즘을 그래프 작업으로 재구성하고 그래프 인식 동형 어텐션(Graph-Aware Isomorphic Attention)을 제안합니다. 이 방법은 Graph Isomorphism Networks (GIN) 및 Principal Neighborhood Aggregation (PNA)과 같은 고급 그래프 모델링 전략을 활용하여 관계 구조의 표현을 풍부하게 합니다. 저희의 접근 방식은 복잡한 종속성을 포착하고 일반화를 향상시키며, 일반화 갭을 줄이고 학습 성능을 향상시키는 것으로 입증됩니다. 더불어, 그래프 인식 어텐션 개념을 확장하여 Sparse GIN-Attention을 소개하여 희소 GIN을 활용하는 세밀한 조정 방법을 제안합니다. 어텐션 행렬을 희소 인접 그래프로 해석함으로써, 이 기술은 사전 훈련된 기본 모델의 적응성을 향상시키고 최소한의 계산 부담으로 그래프 인식 능력을 부여합니다. Sparse GIN-Attention 세밀한 조정은 저랭크 적응 (LoRA)과 같은 대안 방법과 비교하여 개선된 훈련 역학과 더 나은 일반화를 달성합니다. 전통적인 어텐션 메커니즘 내의 잠재적 그래프와 같은 구조에 대해 논의하며, Transformer를 관계 추론을 위한 계층적 GIN 모델로 발전시킴으로써 새로운 시각을 제시합니다. 이 관점은 기초 모델 개발에 대한 깊은 영향을 제안하며, 지역 및 전역 종속성에 동적으로 적응할 수 있는 아키텍처의 설계를 가능하게 합니다. 생물정보학, 재료과학, 언어 모델링 및 그 이상의 응용 분야에서 관계 및 순차 데이터 모델링의 통합으로 이어지는 이러한 전략은 해석 가능하고 일반화 가능한 모델링 전략을 위한 무대를 마련합니다.

English

We present an approach to modifying Transformer architectures by integrating graph-aware relational reasoning into the attention mechanism, merging concepts from graph neural networks and language modeling. Building on the inherent connection between attention and graph theory, we reformulate the Transformer's attention mechanism as a graph operation and propose Graph-Aware Isomorphic Attention. This method leverages advanced graph modeling strategies, including Graph Isomorphism Networks (GIN) and Principal Neighborhood Aggregation (PNA), to enrich the representation of relational structures. Our approach captures complex dependencies and generalizes across tasks, as evidenced by a reduced generalization gap and improved learning performance. Additionally, we expand the concept of graph-aware attention to introduce Sparse GIN-Attention, a fine-tuning approach that employs sparse GINs. By interpreting attention matrices as sparse adjacency graphs, this technique enhances the adaptability of pre-trained foundational models with minimal computational overhead, endowing them with graph-aware capabilities. Sparse GIN-Attention fine-tuning achieves improved training dynamics and better generalization compared to alternative methods like low-rank adaption (LoRA). We discuss latent graph-like structures within traditional attention mechanisms, offering a new lens through which Transformers can be understood. By evolving Transformers as hierarchical GIN models for relational reasoning. This perspective suggests profound implications for foundational model development, enabling the design of architectures that dynamically adapt to both local and global dependencies. Applications in bioinformatics, materials science, language modeling, and beyond could benefit from this synthesis of relational and sequential data modeling, setting the stage for interpretable and generalizable modeling strategies.

트랜스포머의 적응적 동적을 위한 그래프 인식 이송 헤드

Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers

초록

Support