ChatPaper.aiChatPaper

ZipAR:通过空间局部性加速自回归图像生成

ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality

December 5, 2024
作者: Yefei He, Feng Chen, Yuanyu He, Shaoxuan He, Hong Zhou, Kaipeng Zhang, Bohan Zhuang
cs.AI

摘要

本文提出了ZipAR,一个无需训练、即插即用的并行解码框架,用于加速自回归(AR)视觉生成。动机源于观察到图像呈现局部结构,空间上相距较远的区域往往具有最小的相互依赖性。给定部分解码的视觉令牌集,除了在行维度上的原始下一个令牌预测方案外,可以并行解码列维度中空间相邻区域对应的令牌,实现“下一个集合预测”范式。通过在单个前向传递中同时解码多个令牌,生成图像所需的前向传递次数显著减少,从而大幅提高生成效率。实验证明,ZipAR可以在不需要任何额外重新训练的情况下,将在Emu3-Gen模型上的模型前向传递次数减少高达91%。
English
In this paper, we propose ZipAR, a training-free, plug-and-play parallel decoding framework for accelerating auto-regressive (AR) visual generation. The motivation stems from the observation that images exhibit local structures, and spatially distant regions tend to have minimal interdependence. Given a partially decoded set of visual tokens, in addition to the original next-token prediction scheme in the row dimension, the tokens corresponding to spatially adjacent regions in the column dimension can be decoded in parallel, enabling the ``next-set prediction'' paradigm. By decoding multiple tokens simultaneously in a single forward pass, the number of forward passes required to generate an image is significantly reduced, resulting in a substantial improvement in generation efficiency. Experiments demonstrate that ZipAR can reduce the number of model forward passes by up to 91% on the Emu3-Gen model without requiring any additional retraining.

Summary

AI-Generated Summary

PDF92December 6, 2024