魔法1对1:在一分钟内生成一分钟的视频剪辑
Magic 1-For-1: Generating One Minute Video Clips within One Minute
February 11, 2025
作者: Hongwei Yi, Shitong Shao, Tian Ye, Jiantong Zhao, Qingyu Yin, Michael Lingelbach, Li Yuan, Yonghong Tian, Enze Xie, Daquan Zhou
cs.AI
摘要
在这份技术报告中,我们提出了Magic 1-For-1(Magic141),这是一个具有优化内存消耗和推理延迟的高效视频生成模型。关键思想很简单:将文本到视频生成任务分解为两个单独且更容易的扩散步骤精炼任务,即文本到图像生成和图像到视频生成。我们验证了在相同的优化算法下,图像到视频任务确实比文本到视频任务更容易收敛。我们还探索了一系列优化技巧,以降低训练图像到视频(I2V)模型的计算成本:1)通过使用多模态先验条件注入来加快模型收敛速度;2)通过应用对抗式步骤精炼来加快推理延迟;3)通过参数稀疏化来优化推理内存成本。借助这些技术,我们能够在3秒内生成5秒的视频片段。通过应用测试时间滑动窗口,我们能够在一分钟内生成一分钟长的视频,显著提高了视觉质量和动态效果,平均花费不到1秒的时间生成1秒的视频片段。我们进行了一系列初步探索,以找到在扩散步骤精炼过程中计算成本和视频质量之间的最佳权衡,并希望这可以成为开源探索的良好基础模型。代码和模型权重可在https://github.com/DA-Group-PKU/Magic-1-For-1找到。
English
In this technical report, we present Magic 1-For-1 (Magic141), an efficient
video generation model with optimized memory consumption and inference latency.
The key idea is simple: factorize the text-to-video generation task into two
separate easier tasks for diffusion step distillation, namely text-to-image
generation and image-to-video generation. We verify that with the same
optimization algorithm, the image-to-video task is indeed easier to converge
over the text-to-video task. We also explore a bag of optimization tricks to
reduce the computational cost of training the image-to-video (I2V) models from
three aspects: 1) model convergence speedup by using a multi-modal prior
condition injection; 2) inference latency speed up by applying an adversarial
step distillation, and 3) inference memory cost optimization with parameter
sparsification. With those techniques, we are able to generate 5-second video
clips within 3 seconds. By applying a test time sliding window, we are able to
generate a minute-long video within one minute with significantly improved
visual quality and motion dynamics, spending less than 1 second for generating
1 second video clips on average. We conduct a series of preliminary
explorations to find out the optimal tradeoff between computational cost and
video quality during diffusion step distillation and hope this could be a good
foundation model for open-source explorations. The code and the model weights
are available at https://github.com/DA-Group-PKU/Magic-1-For-1.Summary
AI-Generated Summary