ChatPaper.aiChatPaper

MarDini:用于大规模视频生成的掩码自回归扩散

MarDini: Masked Autoregressive Diffusion for Video Generation at Scale

October 26, 2024
作者: Haozhe Liu, Shikun Liu, Zijian Zhou, Mengmeng Xu, Yanping Xie, Xiao Han, Juan C. Pérez, Ding Liu, Kumara Kahatapitiya, Menglin Jia, Jui-Chieh Wu, Sen He, Tao Xiang, Jürgen Schmidhuber, Juan-Manuel Pérez-Rúa
cs.AI

摘要

我们介绍了MarDini,这是一种新型视频扩散模型系列,将掩蔽自回归(MAR)的优势融入统一的扩散模型(DM)框架中。在这里,MAR处理时间规划,而DM专注于在不对称网络设计中的空间生成:i)基于MAR的规划模型包含大部分参数,使用低分辨率输入为每个掩蔽帧生成规划信号;ii)轻量级生成模型利用这些信号通过扩散去噪生成高分辨率帧。MarDini的MAR使视频生成能够根据任意数量的掩蔽帧和任何帧位置进行条件化:单个模型可以处理视频插值(例如,掩蔽中间帧),图像到视频生成(例如,从第二帧开始掩蔽),以及视频扩展(例如,掩蔽一半帧)。这种高效设计将大部分计算资源分配给低分辨率规划模型,使得在规模上能够实现计算昂贵但重要的时空注意力。MarDini在视频插值方面树立了新的技术标准;同时,在少数推理步骤内,它能够高效生成视频,与更昂贵的先进图像到视频模型相媲美。
English
We introduce MarDini, a new family of video diffusion models that integrate the advantages of masked auto-regression (MAR) into a unified diffusion model (DM) framework. Here, MAR handles temporal planning, while DM focuses on spatial generation in an asymmetric network design: i) a MAR-based planning model containing most of the parameters generates planning signals for each masked frame using low-resolution input; ii) a lightweight generation model uses these signals to produce high-resolution frames via diffusion de-noising. MarDini's MAR enables video generation conditioned on any number of masked frames at any frame positions: a single model can handle video interpolation (e.g., masking middle frames), image-to-video generation (e.g., masking from the second frame onward), and video expansion (e.g., masking half the frames). The efficient design allocates most of the computational resources to the low-resolution planning model, making computationally expensive but important spatio-temporal attention feasible at scale. MarDini sets a new state-of-the-art for video interpolation; meanwhile, within few inference steps, it efficiently generates videos on par with those of much more expensive advanced image-to-video models.

Summary

AI-Generated Summary

PDF232November 16, 2024