ZipAR：通過空間局部性加速自回歸圖像生成

摘要

本文提出了ZipAR，一個無需訓練、即插即用的並行解碼框架，用於加速自回歸（AR）視覺生成。動機源於觀察到圖像展現出局部結構，空間上遠離的區域往往具有最小的相互依賴性。給定一部分解碼的視覺標記集，除了在行維度上的原始下一標記預測方案外，與列維度中空間相鄰區域對應的標記可以並行解碼，實現“下一集預測”範式。通過在單個前向傳遞中同時解碼多個標記，生成圖像所需的前向傳遞次數顯著減少，從而顯著提高生成效率。實驗表明，ZipAR 可以在 Emu3-Gen 模型上將模型的前向傳遞次數降低高達 91%，而無需進行任何額外的重新訓練。

English

In this paper, we propose ZipAR, a training-free, plug-and-play parallel decoding framework for accelerating auto-regressive (AR) visual generation. The motivation stems from the observation that images exhibit local structures, and spatially distant regions tend to have minimal interdependence. Given a partially decoded set of visual tokens, in addition to the original next-token prediction scheme in the row dimension, the tokens corresponding to spatially adjacent regions in the column dimension can be decoded in parallel, enabling the ``next-set prediction'' paradigm. By decoding multiple tokens simultaneously in a single forward pass, the number of forward passes required to generate an image is significantly reduced, resulting in a substantial improvement in generation efficiency. Experiments demonstrate that ZipAR can reduce the number of model forward passes by up to 91% on the Emu3-Gen model without requiring any additional retraining.

ZipAR：通過空間局部性加速自回歸圖像生成

ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality

摘要

Support