FasterCache:高品質訓練免費視頻擴散模型加速
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
October 25, 2024
作者: Zhengyao Lv, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei Liu, Kwan-Yee K. Wong
cs.AI
摘要
本文提出了一種名為\textit{FasterCache}的新型無需訓練的策略,旨在加速具有高質量生成的視頻擴散模型的推斷。通過分析現有基於緩存的方法,我們觀察到直接重複使用相鄰步驟特徵會降低視頻質量,因為會丟失微妙的變化。我們進一步對不需要分類器指導(CFG)的加速潛力進行了開創性調查,並揭示了同一時間步內條件和非條件特徵之間的顯著冗餘。基於這些觀察,我們引入了FasterCache,以顯著加速基於擴散的視頻生成。我們的主要貢獻包括一種動態特徵重複使用策略,既保留了特徵的區別性又保持了時間上的連續性,以及優化條件和非條件輸出重複使用的CFG-Cache,進一步增強推斷速度而不影響視頻質量。我們在最新的視頻擴散模型上對FasterCache進行了實證評估。實驗結果表明,FasterCache可以顯著加速視頻生成(例如,在Vchitect-2.0上加速1.67倍),同時保持視頻質量與基準相當,並在推斷速度和視頻質量方面始終優於現有方法。
English
In this paper, we present \textit{FasterCache}, a novel
training-free strategy designed to accelerate the inference of video diffusion
models with high-quality generation. By analyzing existing cache-based methods,
we observe that directly reusing adjacent-step features degrades video
quality due to the loss of subtle variations. We further perform a pioneering
investigation of the acceleration potential of classifier-free guidance (CFG)
and reveal significant redundancy between conditional and unconditional
features within the same timestep. Capitalizing on these observations, we
introduce FasterCache to substantially accelerate diffusion-based video
generation. Our key contributions include a dynamic feature reuse strategy that
preserves both feature distinction and temporal continuity, and CFG-Cache which
optimizes the reuse of conditional and unconditional outputs to further enhance
inference speed without compromising video quality. We empirically evaluate
FasterCache on recent video diffusion models. Experimental results show that
FasterCache can significantly accelerate video generation (\eg 1.67times
speedup on Vchitect-2.0) while keeping video quality comparable to the
baseline, and consistently outperform existing methods in both inference speed
and video quality.Summary
AI-Generated Summary