兩全其美：混合圖序列模型的優勢

摘要

現代序列模型（例如Transformers、線性RNN等）已成為近期深度學習框架的主要支柱，主要是因為它們的效率、表徵能力和/或捕捉長距離依賴性的能力。採用這些序列模型來處理圖結構數據最近已經變得流行，作為傳遞消息神經網絡（MPNNs）的替代方案。然而，目前對於什麼構成良好的圖序列模型以及採用不同序列模型進行圖學習的好處和不足缺乏共同基礎的數學描述。為此，我們首先提出圖序列模型（GSM），這是一個統一的框架，用於採用序列模型進行圖學習，包括三個主要步驟：（1）標記化，將圖轉換為一組序列；（2）局部編碼，將每個節點周圍的局部鄰域進行編碼；以及（3）全局編碼，使用可擴展的序列模型來捕捉序列內的長距離依賴性。這個框架使我們能夠理解、評估和比較不同序列模型支柱在圖任務中的能力。我們對Transformers和現代循環模型的表徵能力進行理論評估，通過全局和局部圖任務的角度來看，顯示這兩種模型都有正反兩面。基於這一觀察，我們提出了GSM++，一個快速的混合模型，使用分層親和聚類（HAC）算法將圖標記為分層序列，然後採用Transformer的混合架構來編碼這些序列。我們的理論和實驗結果支持GSM++的設計，表明GSM++在大多數基準評估中優於基準。

English

Modern sequence models (e.g., Transformers, linear RNNs, etc.) emerged as dominant backbones of recent deep learning frameworks, mainly due to their efficiency, representational power, and/or ability to capture long-range dependencies. Adopting these sequence models for graph-structured data has recently gained popularity as the alternative to Message Passing Neural Networks (MPNNs). There is, however, a lack of a common foundation about what constitutes a good graph sequence model, and a mathematical description of the benefits and deficiencies in adopting different sequence models for learning on graphs. To this end, we first present Graph Sequence Model (GSM), a unifying framework for adopting sequence models for graphs, consisting of three main steps: (1) Tokenization, which translates the graph into a set of sequences; (2) Local Encoding, which encodes local neighborhoods around each node; and (3) Global Encoding, which employs a scalable sequence model to capture long-range dependencies within the sequences. This framework allows us to understand, evaluate, and compare the power of different sequence model backbones in graph tasks. Our theoretical evaluations of the representation power of Transformers and modern recurrent models through the lens of global and local graph tasks show that there are both negative and positive sides for both types of models. Building on this observation, we present GSM++, a fast hybrid model that uses the Hierarchical Affinity Clustering (HAC) algorithm to tokenize the graph into hierarchical sequences, and then employs a hybrid architecture of Transformer to encode these sequences. Our theoretical and experimental results support the design of GSM++, showing that GSM++ outperforms baselines in most benchmark evaluations.

兩全其美：混合圖序列模型的優勢

Best of Both Worlds: Advantages of Hybrid Graph Sequence Models

摘要

Summary

Support