軌跡注意力用於精細視頻運動控制
Trajectory Attention for Fine-grained Video Motion Control
November 28, 2024
作者: Zeqi Xiao, Wenqi Ouyang, Yifan Zhou, Shuai Yang, Lei Yang, Jianlou Si, Xingang Pan
cs.AI
摘要
近年來,影片生成的最新進展在很大程度上受到影片擴散模型的推動,攝像機運動控制成為創建定制視覺內容的一個關鍵挑戰。本文介紹了軌跡注意力,這是一種新穎的方法,通過沿著可用像素軌跡執行注意力,以實現精細的攝像機運動控制。與現有方法不同,這些方法通常產生不精確的輸出或忽略時間相關性,我們的方法具有更強的歸納偏差,無縫地將軌跡信息注入到影片生成過程中。重要的是,我們的方法將軌跡注意力建模為傳統時間注意力的輔助分支。這種設計使原始的時間注意力和軌跡注意力能夠協同工作,確保精確的運動控制和新的內容生成能力,這在軌跡僅部分可用時至關重要。對於圖像和影片的攝像機運動控制實驗表明,在保持高質量生成的同時,精度和長距離一致性均有顯著改善。此外,我們展示了我們的方法可以擴展到其他影片運動控制任務,例如首幀引導的影片編輯,在這些任務中,我們的方法在保持大範圍空間和時間一致性方面表現出色。
English
Recent advancements in video generation have been greatly driven by video
diffusion models, with camera motion control emerging as a crucial challenge in
creating view-customized visual content. This paper introduces trajectory
attention, a novel approach that performs attention along available pixel
trajectories for fine-grained camera motion control. Unlike existing methods
that often yield imprecise outputs or neglect temporal correlations, our
approach possesses a stronger inductive bias that seamlessly injects trajectory
information into the video generation process. Importantly, our approach models
trajectory attention as an auxiliary branch alongside traditional temporal
attention. This design enables the original temporal attention and the
trajectory attention to work in synergy, ensuring both precise motion control
and new content generation capability, which is critical when the trajectory is
only partially available. Experiments on camera motion control for images and
videos demonstrate significant improvements in precision and long-range
consistency while maintaining high-quality generation. Furthermore, we show
that our approach can be extended to other video motion control tasks, such as
first-frame-guided video editing, where it excels in maintaining content
consistency over large spatial and temporal ranges.Summary
AI-Generated Summary