连续推测解码用于自回归图像生成

摘要

连续值自回归（AR）图像生成模型已经显示出明显优势，优于其离散令牌对应物，展示出相当的重建质量和更高的生成保真度。然而，自回归框架的计算需求导致了显著的推断开销。虽然推测解码已被证明对加速大型语言模型（LLMs）有效，但将其应用于连续值视觉自回归模型尚未被探索。本文将推测解码算法从离散令牌推广到连续空间。通过分析输出分布的固有特性，我们为这类模型中普遍存在的扩散分布建立了定制的接受标准。为了克服推测解码输出分布中出现的不一致性，我们引入了去噪轨迹对齐和令牌预填充方法。此外，我们确定了拒绝阶段中难以采样的分布。为了缓解这个问题，我们提出了一种细致的接受-拒绝采样方法，设定适当的上界，从而避免复杂的积分。实验结果表明，我们的连续推测解码在现成模型上实现了显著的2.33倍加速，同时保持了输出分布。代码将在 https://github.com/MarkXCloud/CSpD 上提供。

English

Continuous-valued Autoregressive (AR) image generation models have demonstrated notable superiority over their discrete-token counterparts, showcasing considerable reconstruction quality and higher generation fidelity. However, the computational demands of the autoregressive framework result in significant inference overhead. While speculative decoding has proven effective in accelerating Large Language Models (LLMs), their adaptation to continuous-valued visual autoregressive models remains unexplored. This work generalizes the speculative decoding algorithm from discrete tokens to continuous space. By analyzing the intrinsic properties of output distribution, we establish a tailored acceptance criterion for the diffusion distributions prevalent in such models. To overcome the inconsistency that occurred in speculative decoding output distributions, we introduce denoising trajectory alignment and token pre-filling methods. Additionally, we identify the hard-to-sample distribution in the rejection phase. To mitigate this issue, we propose a meticulous acceptance-rejection sampling method with a proper upper bound, thereby circumventing complex integration. Experimental results show that our continuous speculative decoding achieves a remarkable 2.33times speed-up on off-the-shelf models while maintaining the output distribution. Codes will be available at https://github.com/MarkXCloud/CSpD

连续推测解码用于自回归图像生成

Continuous Speculative Decoding for Autoregressive Image Generation

摘要

Support