ChatPaper.aiChatPaper

適應性快取以擴散Transformer加速影片生成

Adaptive Caching for Faster Video Generation with Diffusion Transformers

November 4, 2024
作者: Kumara Kahatapitiya, Haozhe Liu, Sen He, Ding Liu, Menglin Jia, Michael S. Ryoo, Tian Xie
cs.AI

摘要

生成時間一致且高保真度的影片可能在計算上具有高昂成本,特別是在較長的時間跨度上。最近的擴散Transformer(DiTs)雖然在這方面取得了顯著進展,但由於依賴更大的模型和更重的注意機制,導致推理速度變慢,進一步加劇了這些挑戰。在本文中,我們介紹了一種無需訓練的方法來加速影片DiTs,稱為自適應緩存(AdaCache),其動機在於“並非所有影片皆相同”:意即,有些影片只需較少的去噪步驟即可達到合理的品質。基於此,我們不僅通過擴散過程進行緩存計算,還為每個影片生成設計了一個適合的緩存計劃,最大程度地平衡品質和延遲之間的權衡。我們進一步引入了運動正則化(MoReg)方案,以在AdaCache中利用影片信息,基本上根據運動內容控制計算分配。總的來說,我們的即插即用貢獻可以顯著提高推理速度(例如,在Open-Sora 720p - 2s影片生成上最多可達4.7倍),而不會犧牲生成品質,跨多個影片DiT基線。
English
Generating temporally-consistent high-fidelity videos can be computationally expensive, especially over longer temporal spans. More-recent Diffusion Transformers (DiTs) -- despite making significant headway in this context -- have only heightened such challenges as they rely on larger models and heavier attention mechanisms, resulting in slower inference speeds. In this paper, we introduce a training-free method to accelerate video DiTs, termed Adaptive Caching (AdaCache), which is motivated by the fact that "not all videos are created equal": meaning, some videos require fewer denoising steps to attain a reasonable quality than others. Building on this, we not only cache computations through the diffusion process, but also devise a caching schedule tailored to each video generation, maximizing the quality-latency trade-off. We further introduce a Motion Regularization (MoReg) scheme to utilize video information within AdaCache, essentially controlling the compute allocation based on motion content. Altogether, our plug-and-play contributions grant significant inference speedups (e.g. up to 4.7x on Open-Sora 720p - 2s video generation) without sacrificing the generation quality, across multiple video DiT baselines.

Summary

AI-Generated Summary

PDF241November 13, 2024