LeviTor:以3D軌跡為導向的影像到影片合成
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
December 19, 2024
作者: Hanlin Wang, Hao Ouyang, Qiuyu Wang, Wen Wang, Ka Leong Cheng, Qifeng Chen, Yujun Shen, Limin Wang
cs.AI
摘要
基於拖曳的互動直覺性,已促使其在影像到影片合成中控制物體軌跡的應用不斷增長。然而,現有在二維空間進行拖曳的方法通常在處理平面外移動時會面臨歧義。在本研究中,我們通過引入一個新維度,即深度維度,來擴充互動,使用戶能夠為軌跡上的每一點分配相對深度。這樣,我們的新互動範式不僅繼承了二維拖曳的便利性,還促進了在三維空間中的軌跡控制,擴大了創造力的範疇。我們提出了一種開創性的方法,將物體遮罩抽象為幾個集群點,用這些點以及深度信息和實例信息最終作為控制信號輸入到視頻擴散模型中,以實現影像到影片合成中的三維軌跡控制。大量實驗驗證了我們的方法 LeviTor 的有效性,在從靜態圖像生成逼真照片的影片時精確操縱物體移動。項目頁面:https://ppetrichor.github.io/levitor.github.io/
English
The intuitive nature of drag-based interaction has led to its growing
adoption for controlling object trajectories in image-to-video synthesis.
Still, existing methods that perform dragging in the 2D space usually face
ambiguity when handling out-of-plane movements. In this work, we augment the
interaction with a new dimension, i.e., the depth dimension, such that users
are allowed to assign a relative depth for each point on the trajectory. That
way, our new interaction paradigm not only inherits the convenience from 2D
dragging, but facilitates trajectory control in the 3D space, broadening the
scope of creativity. We propose a pioneering method for 3D trajectory control
in image-to-video synthesis by abstracting object masks into a few cluster
points. These points, accompanied by the depth information and the instance
information, are finally fed into a video diffusion model as the control
signal. Extensive experiments validate the effectiveness of our approach,
dubbed LeviTor, in precisely manipulating the object movements when producing
photo-realistic videos from static images. Project page:
https://ppetrichor.github.io/levitor.github.io/Summary
AI-Generated Summary