連續推理解碼用於自回歸圖像生成

摘要

連續值自回歸（AR）圖像生成模型已證明比其離散標記對應物具有顯著優勢，展示出相當的重建質量和更高的生成保真度。然而，自回歸框架的計算需求導致顯著的推理開銷。雖然猜測解碼已被證明對加速大型語言模型（LLMs）有效，但對連續值視覺自回歸模型的適應尚未被探索。本研究將猜測解碼算法從離散標記擴展到連續空間。通過分析輸出分佈的固有特性，我們為這些模型中普遍存在的擴散分佈建立了一個定制的接受標準。為了克服猜測解碼輸出分佈中出現的不一致性，我們引入了去噪軌跡對齊和標記預填充方法。此外，我們識別了拒絕階段中難以抽樣的分佈。為了緩解這個問題，我們提出了一種細緻的接受-拒絕抽樣方法，並設定適當的上界，從而避免複雜的積分。實驗結果表明，我們的連續猜測解碼在現成模型上實現了顯著的2.33倍加速，同時保持了輸出分佈。代碼將在 https://github.com/MarkXCloud/CSpD 提供。

English

Continuous-valued Autoregressive (AR) image generation models have demonstrated notable superiority over their discrete-token counterparts, showcasing considerable reconstruction quality and higher generation fidelity. However, the computational demands of the autoregressive framework result in significant inference overhead. While speculative decoding has proven effective in accelerating Large Language Models (LLMs), their adaptation to continuous-valued visual autoregressive models remains unexplored. This work generalizes the speculative decoding algorithm from discrete tokens to continuous space. By analyzing the intrinsic properties of output distribution, we establish a tailored acceptance criterion for the diffusion distributions prevalent in such models. To overcome the inconsistency that occurred in speculative decoding output distributions, we introduce denoising trajectory alignment and token pre-filling methods. Additionally, we identify the hard-to-sample distribution in the rejection phase. To mitigate this issue, we propose a meticulous acceptance-rejection sampling method with a proper upper bound, thereby circumventing complex integration. Experimental results show that our continuous speculative decoding achieves a remarkable 2.33times speed-up on off-the-shelf models while maintaining the output distribution. Codes will be available at https://github.com/MarkXCloud/CSpD

連續推理解碼用於自回歸圖像生成

Continuous Speculative Decoding for Autoregressive Image Generation

摘要

Summary

Support