ChatPaper.aiChatPaper

区块扩散:自回归与扩散语言模型间的插值

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

March 12, 2025
作者: Marianne Arriola, Aaron Gokaslan, Justin T Chiu, Zhihan Yang, Zhixuan Qi, Jiaqi Han, Subham Sekhar Sahoo, Volodymyr Kuleshov
cs.AI

摘要

扩散语言模型相较于自回归模型展现出独特优势,其并行生成潜力与可控性尤为突出,然而在似然建模方面稍显不足,且局限于固定长度生成。本研究提出了一类块扩散语言模型,巧妙融合了离散去噪扩散与自回归模型的特点。块扩散技术通过支持灵活长度生成,并利用KV缓存与并行令牌采样提升推理效率,成功克服了两种方法的关键局限。我们提出了一套构建高效块扩散模型的方案,包括高效的训练算法、梯度方差估计器以及数据驱动的噪声调度策略,以最小化方差。在语言建模基准测试中,块扩散模型确立了扩散模型的新标杆,并实现了任意长度序列的生成。项目页面提供了代码、模型权重及博客文章,详情请访问:https://m-arriola.com/bd3lms/。
English
Diffusion language models offer unique benefits over autoregressive models due to their potential for parallelized generation and controllability, yet they lag in likelihood modeling and are limited to fixed-length generation. In this work, we introduce a class of block diffusion language models that interpolate between discrete denoising diffusion and autoregressive models. Block diffusion overcomes key limitations of both approaches by supporting flexible-length generation and improving inference efficiency with KV caching and parallel token sampling. We propose a recipe for building effective block diffusion models that includes an efficient training algorithm, estimators of gradient variance, and data-driven noise schedules to minimize the variance. Block diffusion sets a new state-of-the-art performance among diffusion models on language modeling benchmarks and enables generation of arbitrary-length sequences. We provide the code, along with the model weights and blog post on the project page: https://m-arriola.com/bd3lms/

Summary

AI-Generated Summary

PDF453March 13, 2025