两全其美：混合图序列模型的优势

摘要

现代序列模型（例如，Transformer、线性RNN等）已成为最近深度学习框架的主要支柱，主要是因为它们的效率、表征能力和/或捕获长距离依赖关系的能力。最近，采用这些序列模型来处理图结构数据作为消息传递神经网络（MPNNs）的替代方法日益流行。然而，关于构成良好图序列模型的共同基础以及采用不同序列模型进行图学习的好处和不足缺乏共识。为此，我们首先提出了图序列模型（GSM），这是一个统一的框架，用于采用序列模型处理图数据，包括三个主要步骤：（1）标记化，将图转换为一组序列；（2）局部编码，对每个节点周围的局部邻域进行编码；以及（3）全局编码，利用可扩展的序列模型捕获序列内的长距离依赖关系。这一框架使我们能够理解、评估和比较不同序列模型支柱在图任务中的能力。我们通过全局和局部图任务的理论评估来评估Transformer和现代循环模型的表征能力，结果显示这两种模型都有积极和消极的一面。基于这一观察，我们提出了GSM++，这是一个快速的混合模型，使用分层亲和聚类（HAC）算法将图标记化为分层序列，然后采用Transformer的混合架构来对这些序列进行编码。我们的理论和实验结果支持了GSM++的设计，表明GSM++在大多数基准评估中优于基准模型。

English

Modern sequence models (e.g., Transformers, linear RNNs, etc.) emerged as dominant backbones of recent deep learning frameworks, mainly due to their efficiency, representational power, and/or ability to capture long-range dependencies. Adopting these sequence models for graph-structured data has recently gained popularity as the alternative to Message Passing Neural Networks (MPNNs). There is, however, a lack of a common foundation about what constitutes a good graph sequence model, and a mathematical description of the benefits and deficiencies in adopting different sequence models for learning on graphs. To this end, we first present Graph Sequence Model (GSM), a unifying framework for adopting sequence models for graphs, consisting of three main steps: (1) Tokenization, which translates the graph into a set of sequences; (2) Local Encoding, which encodes local neighborhoods around each node; and (3) Global Encoding, which employs a scalable sequence model to capture long-range dependencies within the sequences. This framework allows us to understand, evaluate, and compare the power of different sequence model backbones in graph tasks. Our theoretical evaluations of the representation power of Transformers and modern recurrent models through the lens of global and local graph tasks show that there are both negative and positive sides for both types of models. Building on this observation, we present GSM++, a fast hybrid model that uses the Hierarchical Affinity Clustering (HAC) algorithm to tokenize the graph into hierarchical sequences, and then employs a hybrid architecture of Transformer to encode these sequences. Our theoretical and experimental results support the design of GSM++, showing that GSM++ outperforms baselines in most benchmark evaluations.

两全其美：混合图序列模型的优势

Best of Both Worlds: Advantages of Hybrid Graph Sequence Models

摘要

Summary

Support

Support