利用無需訓練的推測雅可比解碼來加速自回歸文本生成圖像。

摘要

目前的大型自迴歸模型能夠生成高質量、高分辨率的圖像，但在推論過程中需要數百甚至數千步的下一個 token 預測，這導致了相當大的時間消耗。在現有研究中，Jacobi 解碼，一種迭代並行解碼算法，被用來加速自迴歸生成，並且可以在無需訓練的情況下執行。然而，Jacobi 解碼依賴於確定性標準來確定迭代的收斂性。因此，它適用於貪婪解碼，但與基於抽樣的解碼不相容，而這對於當前自迴歸文本到圖像生成中的視覺質量和多樣性至關重要。在本文中，我們提出了一種無需訓練的概率並行解碼算法，名為推測性 Jacobi 解碼（SJD），以加速自迴歸文本到圖像生成。通過引入概率收斂標準，我們的 SJD 在保持抽樣式 token 解碼中的隨機性的同時，加速了自迴歸文本到圖像生成的推論，並允許模型生成多樣的圖像。具體來說，SJD 促使模型在每個步驟預測多個 token，並根據概率標準接受 token，使模型能夠比傳統的下一個 token 預測範式生成更少步驟的圖像。我們還研究了利用視覺數據的空間局部性來改進在特定情況下加速比率的 token 初始化策略。我們對多個自迴歸文本到圖像生成模型進行了我們提出的 SJD 實驗，展示了在不犧牲視覺質量的情況下模型加速的有效性。

English

The current large auto-regressive models can generate high-quality, high-resolution images, but these models require hundreds or even thousands of steps of next-token prediction during inference, resulting in substantial time consumption. In existing studies, Jacobi decoding, an iterative parallel decoding algorithm, has been used to accelerate the auto-regressive generation and can be executed without training. However, the Jacobi decoding relies on a deterministic criterion to determine the convergence of iterations. Thus, it works for greedy decoding but is incompatible with sampling-based decoding which is crucial for visual quality and diversity in the current auto-regressive text-to-image generation. In this paper, we propose a training-free probabilistic parallel decoding algorithm, Speculative Jacobi Decoding (SJD), to accelerate auto-regressive text-to-image generation. By introducing a probabilistic convergence criterion, our SJD accelerates the inference of auto-regressive text-to-image generation while maintaining the randomness in sampling-based token decoding and allowing the model to generate diverse images. Specifically, SJD facilitates the model to predict multiple tokens at each step and accepts tokens based on the probabilistic criterion, enabling the model to generate images with fewer steps than the conventional next-token-prediction paradigm. We also investigate the token initialization strategies that leverage the spatial locality of visual data to further improve the acceleration ratio under specific scenarios. We conduct experiments for our proposed SJD on multiple auto-regressive text-to-image generation models, showing the effectiveness of model acceleration without sacrificing the visual quality.

利用無需訓練的推測雅可比解碼來加速自回歸文本生成圖像。

Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding

摘要

Summary

Support

Support