NExT-Mol:三维扩散与一维语言建模融合,实现三维分子生成
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation
February 18, 2025
作者: Zhiyuan Liu, Yanchen Luo, Han Huang, Enzhi Zhang, Sihang Li, Junfeng Fang, Yaorui Shi, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua
cs.AI
摘要
三维分子生成对于药物发现和材料设计至关重要。尽管先前的研究侧重于利用三维扩散模型在连续三维构象建模中的优势,但它们忽视了基于一维SELFIES的语言模型(LMs)的优点,后者能够生成100%有效的分子,并能利用数十亿规模的一维分子数据集。为了将这些优势结合用于三维分子生成,我们提出了一个基础模型——NExT-Mol:三维扩散与一维语言建模相结合的三维分子生成方法。NExT-Mol首先使用经过广泛预训练的分子LM进行一维分子生成,随后通过三维扩散模型预测生成分子的三维构象。我们通过扩大LM的模型规模、优化扩散神经架构以及应用一维到三维的迁移学习,显著提升了NExT-Mol的性能。值得注意的是,我们的一维分子LM在保证有效性的同时,在分布相似性上显著超越了基线模型,而我们的三维扩散模型在构象预测方面也达到了领先水平。鉴于这些在一维和三维建模上的改进,NExT-Mol在GEOM-DRUGS数据集上的全新三维生成任务中实现了26%的相对FCD提升,在QM9-2014数据集上的条件三维生成任务中平均获得了13%的相对增益。我们的代码和预训练检查点可在https://github.com/acharkq/NExT-Mol获取。
English
3D molecule generation is crucial for drug discovery and material design.
While prior efforts focus on 3D diffusion models for their benefits in modeling
continuous 3D conformers, they overlook the advantages of 1D SELFIES-based
Language Models (LMs), which can generate 100% valid molecules and leverage the
billion-scale 1D molecule datasets. To combine these advantages for 3D molecule
generation, we propose a foundation model -- NExT-Mol: 3D Diffusion Meets 1D
Language Modeling for 3D Molecule Generation. NExT-Mol uses an extensively
pretrained molecule LM for 1D molecule generation, and subsequently predicts
the generated molecule's 3D conformers with a 3D diffusion model. We enhance
NExT-Mol's performance by scaling up the LM's model size, refining the
diffusion neural architecture, and applying 1D to 3D transfer learning.
Notably, our 1D molecule LM significantly outperforms baselines in
distributional similarity while ensuring validity, and our 3D diffusion model
achieves leading performances in conformer prediction. Given these improvements
in 1D and 3D modeling, NExT-Mol achieves a 26% relative improvement in 3D FCD
for de novo 3D generation on GEOM-DRUGS, and a 13% average relative gain for
conditional 3D generation on QM9-2014. Our codes and pretrained checkpoints are
available at https://github.com/acharkq/NExT-Mol.Summary
AI-Generated Summary