ChatPaper.aiChatPaper

WF-VAE:通过小波驱动的能量流增强视频变分自动编码器,用于潜在视频扩散模型。

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

November 26, 2024
作者: Zongjian Li, Bin Lin, Yang Ye, Liuhan Chen, Xinhua Cheng, Shenghai Yuan, Li Yuan
cs.AI

摘要

视频变分自动编码器(VAE)将视频编码为低维潜在空间,成为大多数潜在视频扩散模型(LVDMs)的关键组成部分,以降低模型训练成本。然而,随着生成视频的分辨率和持续时间增加,视频VAE的编码成本成为训练LVDMs的限制瓶颈。此外,大多数LVDMs采用的分块推断方法在处理持续时间较长的视频时可能导致潜在空间的不连续性。解决计算瓶颈的关键在于将视频分解为不同的组件,并高效地编码关键信息。小波变换可以将视频分解为多个频域组件并显著提高效率,因此我们提出了小波流变分自动编码器(WF-VAE),这是一种利用多级小波变换促进低频能量流入潜在表示的自动编码器。此外,我们引入了一种称为因果缓存的方法,它在分块推断过程中保持潜在空间的完整性。与最先进的视频VAE相比,WF-VAE在PSNR和LPIPS指标上表现出优越性能,实现了2倍的吞吐量提高和4倍的内存消耗降低,同时保持竞争性的重建质量。我们的代码和模型可在https://github.com/PKU-YuanGroup/WF-VAE 上找到。
English
Video Variational Autoencoder (VAE) encodes videos into a low-dimensional latent space, becoming a key component of most Latent Video Diffusion Models (LVDMs) to reduce model training costs. However, as the resolution and duration of generated videos increase, the encoding cost of Video VAEs becomes a limiting bottleneck in training LVDMs. Moreover, the block-wise inference method adopted by most LVDMs can lead to discontinuities of latent space when processing long-duration videos. The key to addressing the computational bottleneck lies in decomposing videos into distinct components and efficiently encoding the critical information. Wavelet transform can decompose videos into multiple frequency-domain components and improve the efficiency significantly, we thus propose Wavelet Flow VAE (WF-VAE), an autoencoder that leverages multi-level wavelet transform to facilitate low-frequency energy flow into latent representation. Furthermore, we introduce a method called Causal Cache, which maintains the integrity of latent space during block-wise inference. Compared to state-of-the-art video VAEs, WF-VAE demonstrates superior performance in both PSNR and LPIPS metrics, achieving 2x higher throughput and 4x lower memory consumption while maintaining competitive reconstruction quality. Our code and models are available at https://github.com/PKU-YuanGroup/WF-VAE.

Summary

AI-Generated Summary

PDF112December 3, 2024