擴散對抗式事後訓練用於一步驟影片生成

Diffusion Adversarial Post-Training for One-Step Video Generation

January 14, 2025
作者: Shanchuan Lin, Xin Xia, Yuxi Ren, Ceyuan Yang, Xuefeng Xiao, Lu Jiang
cs.AI

摘要

擴散模型廣泛應用於圖像和視頻生成,但其迭代生成過程緩慢且昂貴。現有的蒸餾方法在圖像領域展示了一步生成的潛力,但仍然存在顯著的質量降級。在這項工作中,我們提出了針對真實數據的對抗後訓練(APT),在擴散預訓練之後用於一步視頻生成。為了提高訓練穩定性和質量,我們對模型架構和訓練程序進行了幾項改進,並引入了一個近似的R1正則化目標。根據實驗,我們的實驗表明,我們的對抗後訓練模型Seaweed-APT能夠使用單個前向評估步驟實時生成2秒、1280x720、24fps的視頻。此外,我們的模型能夠在單一步驟中生成1024px的圖像,實現了與最先進方法相媲美的質量。
English
The diffusion models are widely used for image and video generation, but their iterative generation process is slow and expansive. While existing distillation approaches have demonstrated the potential for one-step generation in the image domain, they still suffer from significant quality degradation. In this work, we propose Adversarial Post-Training (APT) against real data following diffusion pre-training for one-step video generation. To improve the training stability and quality, we introduce several improvements to the model architecture and training procedures, along with an approximated R1 regularization objective. Empirically, our experiments show that our adversarial post-trained model, Seaweed-APT, can generate 2-second, 1280x720, 24fps videos in real time using a single forward evaluation step. Additionally, our model is capable of generating 1024px images in a single step, achieving quality comparable to state-of-the-art methods.

Summary

AI-Generated Summary

PDF294January 15, 2025