ChatPaper.aiChatPaper

随波逐流:使用实时变形噪声的运动可控视频扩散模型

Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

January 14, 2025
作者: Ryan Burgert, Yuancheng Xu, Wenqi Xian, Oliver Pilarski, Pascal Clausen, Mingming He, Li Ma, Yitong Deng, Lingxiao Li, Mohsen Mousavi, Michael Ryoo, Paul Debevec, Ning Yu
cs.AI

摘要

生成建模旨在将随机噪声转化为结构化输出。在这项工作中,我们通过允许运动控制来增强视频扩散模型,采用结构化潜在噪声抽样。这是通过仅改变数据实现的:我们预处理训练视频以产生结构化噪声。因此,我们的方法对扩散模型设计是不可知的,无需更改模型架构或训练流程。具体来说,我们提出了一种新颖的噪声扭曲算法,足够快速以实时运行,用光流场导出的相关扭曲噪声替换随机时间高斯性,同时保留空间高斯性。我们的算法的高效性使我们能够使用扭曲噪声对现代视频扩散基础模型进行微调,而几乎没有额外开销,并为各种用户友好的运动控制提供一站式解决方案:局部对象运动控制、全局摄像机运动控制和运动转移。在我们的扭曲噪声中时间相干性和空间高斯性之间的协调导致有效的运动控制,同时保持逐帧像素质量。大量实验和用户研究证明了我们方法的优势,使其成为视频扩散模型中控制运动的稳健且可扩展的方法。视频结果可在我们的网页上找到:https://vgenai-netflix-eyeline-research.github.io/Go-with-the-Flow。源代码和模型检查点可在GitHub上找到:https://github.com/VGenAI-Netflix-Eyeline-Research/Go-with-the-Flow。
English
Generative modeling aims to transform random noise into structured outputs. In this work, we enhance video diffusion models by allowing motion control via structured latent noise sampling. This is achieved by just a change in data: we pre-process training videos to yield structured noise. Consequently, our method is agnostic to diffusion model design, requiring no changes to model architectures or training pipelines. Specifically, we propose a novel noise warping algorithm, fast enough to run in real time, that replaces random temporal Gaussianity with correlated warped noise derived from optical flow fields, while preserving the spatial Gaussianity. The efficiency of our algorithm enables us to fine-tune modern video diffusion base models using warped noise with minimal overhead, and provide a one-stop solution for a wide range of user-friendly motion control: local object motion control, global camera movement control, and motion transfer. The harmonization between temporal coherence and spatial Gaussianity in our warped noise leads to effective motion control while maintaining per-frame pixel quality. Extensive experiments and user studies demonstrate the advantages of our method, making it a robust and scalable approach for controlling motion in video diffusion models. Video results are available on our webpage: https://vgenai-netflix-eyeline-research.github.io/Go-with-the-Flow. Source code and model checkpoints are available on GitHub: https://github.com/VGenAI-Netflix-Eyeline-Research/Go-with-the-Flow.

Summary

AI-Generated Summary

PDF203January 22, 2025