WF-VAE:透過小波驅動能量流增強視頻變分自編碼器,用於潛在視頻擴散模型。
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
November 26, 2024
作者: Zongjian Li, Bin Lin, Yang Ye, Liuhan Chen, Xinhua Cheng, Shenghai Yuan, Li Yuan
cs.AI
摘要
影片變分自編碼器(VAE)將影片編碼為低維潛在空間,成為大多數潛在影片擴散模型(LVDMs)的關鍵組件,以降低模型訓練成本。然而,隨著生成影片的解析度和持續時間增加,影片 VAE 的編碼成本成為訓練 LVDMs 的限制瓶頸。此外,大多數 LVDMs 採用的分塊推論方法在處理長時間影片時可能導致潛在空間的不連續性。解決計算瓶頸的關鍵在於將影片分解為不同組件並有效編碼關鍵信息。小波變換可以將影片分解為多個頻域組件並顯著提高效率,因此我們提出了小波流變分自編碼器(WF-VAE),這是一種利用多級小波變換促進低頻能量流入潛在表示的自編碼器。此外,我們引入了一種稱為因果緩存的方法,在分塊推論過程中保持潛在空間的完整性。與最先進的影片 VAE 相比,WF-VAE 在 PSNR 和 LPIPS 指標上表現出優異性,實現了兩倍的吞吐量和四分之一的記憶體消耗,同時保持競爭性的重建質量。我們的代碼和模型可在以下鏈接找到:https://github.com/PKU-YuanGroup/WF-VAE。
English
Video Variational Autoencoder (VAE) encodes videos into a low-dimensional
latent space, becoming a key component of most Latent Video Diffusion Models
(LVDMs) to reduce model training costs. However, as the resolution and duration
of generated videos increase, the encoding cost of Video VAEs becomes a
limiting bottleneck in training LVDMs. Moreover, the block-wise inference
method adopted by most LVDMs can lead to discontinuities of latent space when
processing long-duration videos. The key to addressing the computational
bottleneck lies in decomposing videos into distinct components and efficiently
encoding the critical information. Wavelet transform can decompose videos into
multiple frequency-domain components and improve the efficiency significantly,
we thus propose Wavelet Flow VAE (WF-VAE), an autoencoder that leverages
multi-level wavelet transform to facilitate low-frequency energy flow into
latent representation. Furthermore, we introduce a method called Causal Cache,
which maintains the integrity of latent space during block-wise inference.
Compared to state-of-the-art video VAEs, WF-VAE demonstrates superior
performance in both PSNR and LPIPS metrics, achieving 2x higher throughput and
4x lower memory consumption while maintaining competitive reconstruction
quality. Our code and models are available at
https://github.com/PKU-YuanGroup/WF-VAE.Summary
AI-Generated Summary