语言建模的连续扩散模型
Continuous Diffusion Model for Language Modeling
February 17, 2025
作者: Jaehyeong Jo, Sung Ju Hwang
cs.AI
摘要
扩散模型已成为建模离散类别数据时自回归模型的有力替代方案。然而,直接在离散数据空间上操作的扩散模型并未充分利用迭代优化的优势,因为信号在离散状态间的转换过程中会丢失。现有的针对离散数据的连续扩散模型与离散方法相比性能有限,且两者间不明确的联系制约了离散数据扩散模型的发展。本研究提出了一种用于语言建模的连续扩散模型,该模型融入了底层类别分布的几何特性。我们建立了离散扩散与统计流形上连续流动之间的联系,并基于这一类比,引入了一种扩散过程的简洁设计,该设计推广了先前的离散扩散模型。此外,我们提出了一种基于径向对称性的免模拟训练框架,以及一种应对流形高维度的简单技术。在语言建模基准测试及其他模态上的全面实验表明,我们的方法超越了现有的离散扩散模型,并接近自回归模型的性能。代码可在https://github.com/harryjo97/RDLM获取。
English
Diffusion models have emerged as a promising alternative to autoregressive
models in modeling discrete categorical data. Yet diffusion models that
directly work on discrete data space do not fully exploit the power of
iterative refinement, as the signals are lost during the transition between
discrete states. Existing continuous diffusion models for discrete data have
limited performance compared to discrete approaches, and the unclear link
between them restricts the development of diffusion models for discrete data.
In this work, we propose a continuous diffusion model for language modeling
that incorporates the geometry of the underlying categorical distribution. We
establish a connection between the discrete diffusion and continuous flow on
the statistical manifold, and building on the analogy, we introduce a simple
design for the diffusion process that generalizes previous discrete diffusion
models. We further propose a simulation-free training framework based on radial
symmetry and a simple technique to address the high dimensionality of the
manifold. Comprehensive experiments on language modeling benchmarks and other
modalities show that our method outperforms existing discrete diffusion models
and approaches the performance of autoregressive models. Codes available at
https://github.com/harryjo97/RDLM{https://github.com/harryjo97/RDLM}.Summary
AI-Generated Summary