ChatPaper.aiChatPaper

历史引导的视频传播

History-Guided Video Diffusion

February 10, 2025
作者: Kiwhan Song, Boyuan Chen, Max Simchowitz, Yilun Du, Russ Tedrake, Vincent Sitzmann
cs.AI

摘要

无分类器引导(CFG)是改进扩散模型中条件生成的关键技术,可实现更精确的控制并提高样本质量。将这一技术扩展到视频扩散是很自然的,视频扩散生成的视频取决于可变数量的上下文帧,统称为历史。然而,我们发现在使用可变长度历史进行引导时存在两个关键挑战:仅支持固定大小调节的架构,以及CFG风格历史丢弃的实证观察表现不佳。为了解决这个问题,我们提出了扩散强制变换器(DFoT),这是一个视频扩散架构和理论上基础的训练目标,共同实现对灵活数量历史帧的调节。然后,我们介绍了历史引导,这是一系列由DFoT独特实现的引导方法。我们展示了其最简单形式,即普通历史引导,已经显著改善了视频生成质量和时间一致性。一种更先进的方法,跨时间和频率的历史引导,进一步增强了运动动态,实现了对超出分布历史的组成泛化,并能够稳定地展示极长的视频。网站:https://boyuan.space/history-guidance
English
Classifier-free guidance (CFG) is a key technique for improving conditional generation in diffusion models, enabling more accurate control while enhancing sample quality. It is natural to extend this technique to video diffusion, which generates video conditioned on a variable number of context frames, collectively referred to as history. However, we find two key challenges to guiding with variable-length history: architectures that only support fixed-size conditioning, and the empirical observation that CFG-style history dropout performs poorly. To address this, we propose the Diffusion Forcing Transformer (DFoT), a video diffusion architecture and theoretically grounded training objective that jointly enable conditioning on a flexible number of history frames. We then introduce History Guidance, a family of guidance methods uniquely enabled by DFoT. We show that its simplest form, vanilla history guidance, already significantly improves video generation quality and temporal consistency. A more advanced method, history guidance across time and frequency further enhances motion dynamics, enables compositional generalization to out-of-distribution history, and can stably roll out extremely long videos. Website: https://boyuan.space/history-guidance

Summary

AI-Generated Summary

PDF122February 11, 2025