扩散对抗后训练用于一步视频生成

摘要

扩散模型被广泛应用于图像和视频生成，但它们的迭代生成过程缓慢且昂贵。尽管现有的蒸馏方法已经展示了在图像领域进行一步生成的潜力，但它们仍然存在显著的质量下降问题。在这项工作中，我们提出了针对真实数据的对抗后训练（APT），在扩散预训练后用于一步视频生成。为了提高训练稳定性和质量，我们对模型架构和训练程序进行了几项改进，并引入了一个近似的R1正则化目标。从经验上看，我们的实验表明，我们的对抗后训练模型Seaweed-APT能够使用单个前向评估步骤实时生成2秒、1280x720、24fps的视频。此外，我们的模型能够在单个步骤中生成1024像素的图像，实现了与最先进方法相媲美的质量。

English

The diffusion models are widely used for image and video generation, but their iterative generation process is slow and expansive. While existing distillation approaches have demonstrated the potential for one-step generation in the image domain, they still suffer from significant quality degradation. In this work, we propose Adversarial Post-Training (APT) against real data following diffusion pre-training for one-step video generation. To improve the training stability and quality, we introduce several improvements to the model architecture and training procedures, along with an approximated R1 regularization objective. Empirically, our experiments show that our adversarial post-trained model, Seaweed-APT, can generate 2-second, 1280x720, 24fps videos in real time using a single forward evaluation step. Additionally, our model is capable of generating 1024px images in a single step, achieving quality comparable to state-of-the-art methods.

扩散对抗后训练用于一步视频生成

Diffusion Adversarial Post-Training for One-Step Video Generation

摘要

Summary

Support