区块扩散：自回归与扩散语言模型间的插值

摘要

扩散语言模型相较于自回归模型展现出独特优势，其并行生成潜力与可控性尤为突出，然而在似然建模方面稍显不足，且局限于固定长度生成。本研究提出了一类块扩散语言模型，巧妙融合了离散去噪扩散与自回归模型的特点。块扩散技术通过支持灵活长度生成，并利用KV缓存与并行令牌采样提升推理效率，成功克服了两种方法的关键局限。我们提出了一套构建高效块扩散模型的方案，包括高效的训练算法、梯度方差估计器以及数据驱动的噪声调度策略，以最小化方差。在语言建模基准测试中，块扩散模型确立了扩散模型的新标杆，并实现了任意长度序列的生成。项目页面提供了代码、模型权重及博客文章，详情请访问：https://m-arriola.com/bd3lms/。

English

Diffusion language models offer unique benefits over autoregressive models due to their potential for parallelized generation and controllability, yet they lag in likelihood modeling and are limited to fixed-length generation. In this work, we introduce a class of block diffusion language models that interpolate between discrete denoising diffusion and autoregressive models. Block diffusion overcomes key limitations of both approaches by supporting flexible-length generation and improving inference efficiency with KV caching and parallel token sampling. We propose a recipe for building effective block diffusion models that includes an efficient training algorithm, estimators of gradient variance, and data-driven noise schedules to minimize the variance. Block diffusion sets a new state-of-the-art performance among diffusion models on language modeling benchmarks and enables generation of arbitrary-length sequences. We provide the code, along with the model weights and blog post on the project page: https://m-arriola.com/bd3lms/

区块扩散：自回归与扩散语言模型间的插值

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

摘要

Summary

Support