ChatPaper.aiChatPaper

使用扩散变换器实现更快视频生成的自适应缓存

Adaptive Caching for Faster Video Generation with Diffusion Transformers

November 4, 2024
作者: Kumara Kahatapitiya, Haozhe Liu, Sen He, Ding Liu, Menglin Jia, Michael S. Ryoo, Tian Xie
cs.AI

摘要

生成时间一致且高保真度的视频可能在计算上是昂贵的,尤其是在较长的时间跨度上。最近的扩散Transformer(DiTs)虽然在这方面取得了重大进展,但由于依赖更大的模型和更重的注意机制,导致推理速度变慢,进一步加剧了这些挑战。在本文中,我们介绍了一种无需训练的方法来加速视频DiTs,称为自适应缓存(AdaCache),其动机是“并非所有视频都是平等的”:即一些视频只需较少的去噪步骤即可达到合理的质量。基于此,我们不仅通过扩散过程缓存计算,还为每个视频生成设计了一个定制的缓存调度,最大化质量和延迟之间的权衡。我们进一步引入了一种运动正则化(MoReg)方案,以利用AdaCache内的视频信息,从根本上根据运动内容控制计算分配。总的来说,我们的即插即用贡献使得推理速度显著提升(例如,在Open-Sora 720p - 2s视频生成上高达4.7倍),而不会牺牲生成质量,适用于多个视频DiT基线。
English
Generating temporally-consistent high-fidelity videos can be computationally expensive, especially over longer temporal spans. More-recent Diffusion Transformers (DiTs) -- despite making significant headway in this context -- have only heightened such challenges as they rely on larger models and heavier attention mechanisms, resulting in slower inference speeds. In this paper, we introduce a training-free method to accelerate video DiTs, termed Adaptive Caching (AdaCache), which is motivated by the fact that "not all videos are created equal": meaning, some videos require fewer denoising steps to attain a reasonable quality than others. Building on this, we not only cache computations through the diffusion process, but also devise a caching schedule tailored to each video generation, maximizing the quality-latency trade-off. We further introduce a Motion Regularization (MoReg) scheme to utilize video information within AdaCache, essentially controlling the compute allocation based on motion content. Altogether, our plug-and-play contributions grant significant inference speedups (e.g. up to 4.7x on Open-Sora 720p - 2s video generation) without sacrificing the generation quality, across multiple video DiT baselines.

Summary

AI-Generated Summary

PDF241November 13, 2024