LeviTor:基于3D轨迹的图像到视频合成
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
December 19, 2024
作者: Hanlin Wang, Hao Ouyang, Qiuyu Wang, Wen Wang, Ka Leong Cheng, Qifeng Chen, Yujun Shen, Limin Wang
cs.AI
摘要
基于拖拽的交互直观性导致其在图像到视频合成中控制物体轨迹的应用不断增加。然而,现有在二维空间执行拖拽的方法通常在处理平面外移动时面临歧义。在这项工作中,我们通过引入一个新维度,即深度维度,来增强交互,使用户能够为轨迹上的每个点分配相对深度。这样,我们的新交互范式不仅继承了二维拖拽的便利性,还促进了在三维空间中的轨迹控制,拓宽了创造力的范围。我们提出了一种开创性的方法,通过将物体蒙版抽象为少数聚类点来控制图像到视频合成中的三维轨迹。这些点连同深度信息和实例信息最终被馈送到视频扩散模型作为控制信号。大量实验证实了我们的方法 LeviTor 的有效性,在从静态图像生成照片逼真视频时精确操纵物体移动。项目页面:https://ppetrichor.github.io/levitor.github.io/
English
The intuitive nature of drag-based interaction has led to its growing
adoption for controlling object trajectories in image-to-video synthesis.
Still, existing methods that perform dragging in the 2D space usually face
ambiguity when handling out-of-plane movements. In this work, we augment the
interaction with a new dimension, i.e., the depth dimension, such that users
are allowed to assign a relative depth for each point on the trajectory. That
way, our new interaction paradigm not only inherits the convenience from 2D
dragging, but facilitates trajectory control in the 3D space, broadening the
scope of creativity. We propose a pioneering method for 3D trajectory control
in image-to-video synthesis by abstracting object masks into a few cluster
points. These points, accompanied by the depth information and the instance
information, are finally fed into a video diffusion model as the control
signal. Extensive experiments validate the effectiveness of our approach,
dubbed LeviTor, in precisely manipulating the object movements when producing
photo-realistic videos from static images. Project page:
https://ppetrichor.github.io/levitor.github.io/Summary
AI-Generated Summary