適應性快取以擴散Transformer加速影片生成
Adaptive Caching for Faster Video Generation with Diffusion Transformers
November 4, 2024
作者: Kumara Kahatapitiya, Haozhe Liu, Sen He, Ding Liu, Menglin Jia, Michael S. Ryoo, Tian Xie
cs.AI
摘要
生成時間一致且高保真度的影片可能在計算上具有高昂成本,特別是在較長的時間跨度上。最近的擴散Transformer(DiTs)雖然在這方面取得了顯著進展,但由於依賴更大的模型和更重的注意機制,導致推理速度變慢,進一步加劇了這些挑戰。在本文中,我們介紹了一種無需訓練的方法來加速影片DiTs,稱為自適應緩存(AdaCache),其動機在於“並非所有影片皆相同”:意即,有些影片只需較少的去噪步驟即可達到合理的品質。基於此,我們不僅通過擴散過程進行緩存計算,還為每個影片生成設計了一個適合的緩存計劃,最大程度地平衡品質和延遲之間的權衡。我們進一步引入了運動正則化(MoReg)方案,以在AdaCache中利用影片信息,基本上根據運動內容控制計算分配。總的來說,我們的即插即用貢獻可以顯著提高推理速度(例如,在Open-Sora 720p - 2s影片生成上最多可達4.7倍),而不會犧牲生成品質,跨多個影片DiT基線。
English
Generating temporally-consistent high-fidelity videos can be computationally
expensive, especially over longer temporal spans. More-recent Diffusion
Transformers (DiTs) -- despite making significant headway in this context --
have only heightened such challenges as they rely on larger models and heavier
attention mechanisms, resulting in slower inference speeds. In this paper, we
introduce a training-free method to accelerate video DiTs, termed Adaptive
Caching (AdaCache), which is motivated by the fact that "not all videos are
created equal": meaning, some videos require fewer denoising steps to attain a
reasonable quality than others. Building on this, we not only cache
computations through the diffusion process, but also devise a caching schedule
tailored to each video generation, maximizing the quality-latency trade-off. We
further introduce a Motion Regularization (MoReg) scheme to utilize video
information within AdaCache, essentially controlling the compute allocation
based on motion content. Altogether, our plug-and-play contributions grant
significant inference speedups (e.g. up to 4.7x on Open-Sora 720p - 2s video
generation) without sacrificing the generation quality, across multiple video
DiT baselines.Summary
AI-Generated Summary