VFX创作者:具有可控扩散变换器的动画视觉效果生成
VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer
February 9, 2025
作者: Xinyu Liu, Ailing Zeng, Wei Xue, Harry Yang, Wenhan Luo, Qifeng Liu, Yike Guo
cs.AI
摘要
在电影制作中,制作魔术和幻觉是最激动人心的部分之一,视觉效果(VFX)是创造令人难忘的电影体验的强大动力。虽然最近生成人工智能的进步推动了通用图像和视频合成的进展,但可控VFX生成领域仍相对未被充分探索。在这项工作中,我们提出了一种新颖的动画VFX生成范式,即图像动画,其中动态效果是从用户友好的文本描述和静态参考图像生成的。
我们的工作主要有两个贡献:(i)Open-VFX,这是第一个高质量的VFX视频数据集,涵盖了15种不同的效果类别,带有文本描述、空间条件的实例分割掩模以及用于时间控制的起始-结束时间戳。(ii) VFX Creator,这是一个简单而有效的可控VFX生成框架,基于视频扩散变压器。该模型包含一个空间和时间可控的LoRA适配器,需要很少的训练视频。具体而言,一个即插即用的掩模控制模块实现了实例级的空间操作,而嵌入扩散过程中的标记化起始-结束运动时间戳,连同文本编码器,允许对效果的时间和速度进行精确的时间控制。
在Open-VFX测试集上进行的大量实验表明,所提出的系统在生成逼真和动态效果方面优越,实现了在空间和时间可控性方面的最先进性能和泛化能力。此外,我们引入了一个专门的度量标准来评估时间控制的精度。通过将传统的VFX技术与生成方法相结合,VFX Creator为高效且高质量的视频效果生成打开了新的可能性,使先进的VFX技术能够被更广泛的受众所接触。
English
Crafting magic and illusions is one of the most thrilling aspects of
filmmaking, with visual effects (VFX) serving as the powerhouse behind
unforgettable cinematic experiences. While recent advances in generative
artificial intelligence have driven progress in generic image and video
synthesis, the domain of controllable VFX generation remains relatively
underexplored. In this work, we propose a novel paradigm for animated VFX
generation as image animation, where dynamic effects are generated from
user-friendly textual descriptions and static reference images.
Our work makes two primary contributions: (i) Open-VFX, the first
high-quality VFX video dataset spanning 15 diverse effect categories, annotated
with textual descriptions, instance segmentation masks for spatial
conditioning, and start-end timestamps for temporal control. (ii) VFX Creator,
a simple yet effective controllable VFX generation framework based on a Video
Diffusion Transformer. The model incorporates a spatial and temporal
controllable LoRA adapter, requiring minimal training videos. Specifically, a
plug-and-play mask control module enables instance-level spatial manipulation,
while tokenized start-end motion timestamps embedded in the diffusion process,
alongside the text encoder, allow precise temporal control over effect timing
and pace.
Extensive experiments on the Open-VFX test set demonstrate the superiority of
the proposed system in generating realistic and dynamic effects, achieving
state-of-the-art performance and generalization ability in both spatial and
temporal controllability. Furthermore, we introduce a specialized metric to
evaluate the precision of temporal control. By bridging traditional VFX
techniques with generative approaches, VFX Creator unlocks new possibilities
for efficient and high-quality video effect generation, making advanced VFX
accessible to a broader audience.Summary
AI-Generated Summary