ChatPaper.aiChatPaper

FlashVideo:为高效生成高分辨率视频保持细节的流动忠实

FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

February 7, 2025
作者: Shilong Zhang, Wenbo Li, Shoufa Chen, Chongjian Ge, Peize Sun, Yida Zhang, Yi Jiang, Zehuan Yuan, Binyue Peng, Ping Luo
cs.AI

摘要

DiT扩散模型在文本到视频生成中取得了巨大成功,利用其在模型容量和数据规模上的可扩展性。然而,与文本提示对齐的高内容和动态保真度通常需要大量的模型参数和大量的函数评估(NFEs)。现实和视觉上吸引人的细节通常体现在高分辨率输出中,进一步增加了计算需求,特别是对于单阶段的DiT模型。为了解决这些挑战,我们提出了一种新颖的两阶段框架,FlashVideo,该框架在各个阶段之间战略地分配模型容量和NFEs,以平衡生成的保真度和质量。在第一阶段,通过使用大参数和足够的NFEs进行低分辨率生成过程,优先考虑了提示的保真度,以增强计算效率。第二阶段建立了低分辨率和高分辨率之间的流匹配,有效地生成细节,同时最小化NFEs。定量和视觉结果表明,FlashVideo实现了最先进的高分辨率视频生成,具有卓越的计算效率。此外,两阶段设计使用户能够在承诺进行全分辨率生成之前预览初始输出,从而显著降低了计算成本和等待时间,提高了商业可行性。
English
DiT diffusion models have achieved great success in text-to-video generation, leveraging their scalability in model capacity and data scale. High content and motion fidelity aligned with text prompts, however, often require large model parameters and a substantial number of function evaluations (NFEs). Realistic and visually appealing details are typically reflected in high resolution outputs, further amplifying computational demands especially for single stage DiT models. To address these challenges, we propose a novel two stage framework, FlashVideo, which strategically allocates model capacity and NFEs across stages to balance generation fidelity and quality. In the first stage, prompt fidelity is prioritized through a low resolution generation process utilizing large parameters and sufficient NFEs to enhance computational efficiency. The second stage establishes flow matching between low and high resolutions, effectively generating fine details with minimal NFEs. Quantitative and visual results demonstrate that FlashVideo achieves state-of-the-art high resolution video generation with superior computational efficiency. Additionally, the two-stage design enables users to preview the initial output before committing to full resolution generation, thereby significantly reducing computational costs and wait times as well as enhancing commercial viability .

Summary

AI-Generated Summary

PDF243February 10, 2025