ZipAR:通過空間局部性加速自回歸圖像生成

ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality

December 5, 2024
作者: Yefei He, Feng Chen, Yuanyu He, Shaoxuan He, Hong Zhou, Kaipeng Zhang, Bohan Zhuang
cs.AI

摘要

本文提出了ZipAR,一個無需訓練、即插即用的並行解碼框架,用於加速自回歸(AR)視覺生成。動機源於觀察到圖像展現出局部結構,空間上遠離的區域往往具有最小的相互依賴性。給定一部分解碼的視覺標記集,除了在行維度上的原始下一標記預測方案外,與列維度中空間相鄰區域對應的標記可以並行解碼,實現“下一集預測”範式。通過在單個前向傳遞中同時解碼多個標記,生成圖像所需的前向傳遞次數顯著減少,從而顯著提高生成效率。實驗表明,ZipAR 可以在 Emu3-Gen 模型上將模型的前向傳遞次數降低高達 91%,而無需進行任何額外的重新訓練。
English
In this paper, we propose ZipAR, a training-free, plug-and-play parallel decoding framework for accelerating auto-regressive (AR) visual generation. The motivation stems from the observation that images exhibit local structures, and spatially distant regions tend to have minimal interdependence. Given a partially decoded set of visual tokens, in addition to the original next-token prediction scheme in the row dimension, the tokens corresponding to spatially adjacent regions in the column dimension can be decoded in parallel, enabling the ``next-set prediction'' paradigm. By decoding multiple tokens simultaneously in a single forward pass, the number of forward passes required to generate an image is significantly reduced, resulting in a substantial improvement in generation efficiency. Experiments demonstrate that ZipAR can reduce the number of model forward passes by up to 91% on the Emu3-Gen model without requiring any additional retraining.

Summary

AI-Generated Summary

PDF72December 6, 2024