1 ステップのビデオ生成のための拡散敵対的事後トレーニング

要旨

拡散モデルは画像およびビデオ生成に広く使用されていますが、その反復生成プロセスは遅く、費用がかかります。既存の蒸留アプローチは画像領域におけるワンステップ生成の潜在能力を示していますが、依然として著しい品質劣化に苦しんでいます。本研究では、1ステップのビデオ生成のための拡散事前トレーニングに続く実データに対するAdversarial Post-Training（APT）を提案します。トレーニングの安定性と品質を向上させるために、モデルアーキテクチャとトレーニング手順にいくつかの改良を導入し、近似されたR1正則化目的を取り入れます。経験的に、私たちの実験は、私たちのadversarial post-trainedモデル、Seaweed-APTが、単一のフォワード評価ステップを使用してリアルタイムで2秒間、1280x720、24fpsのビデオを生成できることを示しています。さらに、私たちのモデルは、1ステップで1024pxの画像を生成することができ、最先端の手法と比較して同等の品質を達成しています。

English

The diffusion models are widely used for image and video generation, but their iterative generation process is slow and expansive. While existing distillation approaches have demonstrated the potential for one-step generation in the image domain, they still suffer from significant quality degradation. In this work, we propose Adversarial Post-Training (APT) against real data following diffusion pre-training for one-step video generation. To improve the training stability and quality, we introduce several improvements to the model architecture and training procedures, along with an approximated R1 regularization objective. Empirically, our experiments show that our adversarial post-trained model, Seaweed-APT, can generate 2-second, 1280x720, 24fps videos in real time using a single forward evaluation step. Additionally, our model is capable of generating 1024px images in a single step, achieving quality comparable to state-of-the-art methods.

1 ステップのビデオ生成のための拡散敵対的事後トレーニング

Diffusion Adversarial Post-Training for One-Step Video Generation

要旨

Support