그래프 생성 사전 훈련 트랜스포머

초록

그래프 생성은 분자 설계와 소셜 네트워크 분석을 포함한 다양한 영역에서 중요한 작업으로, 복잡한 관계와 구조화된 데이터를 모델링할 수 있는 능력으로 인해 중요합니다. 대부분의 현대 그래프 생성 모델은 인접 행렬 표현을 사용하지만, 본 연구는 그래프를 노드 집합과 엣지 집합의 시퀀스로 표현하는 대안적인 접근 방식을 재검토합니다. 우리는 이 방법론을 그래프를 효율적으로 인코딩하기 위한 이점으로 제시하고 새로운 표현 방법을 제안합니다. 이 표현 방법을 기반으로, 우리는 Graph Generative Pre-trained Transformer (G2PT)를 소개합니다. 이는 다음 토큰 예측을 통해 그래프 구조를 학습하는 자기 회귀 모델입니다. G2PT의 일반적인 기초 모델로서의 능력을 더욱 활용하기 위해, 우리는 목표 지향 생성과 그래프 속성 예측 두 가지 하위 응용에 대한 파인 튜닝 전략을 탐구합니다. 우리는 다양한 데이터셋에서 광범위한 실험을 수행합니다. 결과는 G2PT가 일반적인 그래프와 분자 데이터셋 모두에서 우수한 생성 성능을 달성한다는 것을 보여줍니다. 게다가, G2PT는 분자 설계부터 속성 예측에 이르기까지 다양한 하위 작업에서 강한 적응성과 다재다능성을 나타냅니다.

English

Graph generation is a critical task in numerous domains, including molecular design and social network analysis, due to its ability to model complex relationships and structured data. While most modern graph generative models utilize adjacency matrix representations, this work revisits an alternative approach that represents graphs as sequences of node set and edge set. We advocate for this approach due to its efficient encoding of graphs and propose a novel representation. Based on this representation, we introduce the Graph Generative Pre-trained Transformer (G2PT), an auto-regressive model that learns graph structures via next-token prediction. To further exploit G2PT's capabilities as a general-purpose foundation model, we explore fine-tuning strategies for two downstream applications: goal-oriented generation and graph property prediction. We conduct extensive experiments across multiple datasets. Results indicate that G2PT achieves superior generative performance on both generic graph and molecule datasets. Furthermore, G2PT exhibits strong adaptability and versatility in downstream tasks from molecular design to property prediction.