Track4Gen:教導視頻擴散模型追蹤點有助於提升視頻生成

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

December 8, 2024
作者: Hyeonho Jeong, Chun-Hao Paul Huang, Jong Chul Ye, Niloy Mitra, Duygu Ceylan
cs.AI

摘要

儘管最近的基礎視頻生成器能夠產生視覺豐富的輸出,但它們仍然在外觀漂移方面遇到困難,即物體逐漸退化或在幀之間不一致地變化,破壞了視覺一致性。我們假設這是因為在特徵級別上缺乏明確的空間跟踪監督。我們提出了Track4Gen,這是一個具有空間感知能力的視頻生成器,它將視頻擴散損失與跨幀點跟踪相結合,提供了對擴散特徵的增強空間監督。Track4Gen通過對現有視頻生成架構進行最小更改,將視頻生成和點跟踪任務合併為一個單獨的網絡。使用穩定視頻擴散作為基礎,Track4Gen證明了將通常作為獨立任務處理的視頻生成和點跟踪統一起來是可能的。我們的廣泛評估顯示,Track4Gen有效地減少了外觀漂移,從而產生了時間穩定且視覺一致的視頻生成。項目頁面:hyeonho99.github.io/track4gen
English
While recent foundational video generators produce visually rich output, they still struggle with appearance drift, where objects gradually degrade or change inconsistently across frames, breaking visual coherence. We hypothesize that this is because there is no explicit supervision in terms of spatial tracking at the feature level. We propose Track4Gen, a spatially aware video generator that combines video diffusion loss with point tracking across frames, providing enhanced spatial supervision on the diffusion features. Track4Gen merges the video generation and point tracking tasks into a single network by making minimal changes to existing video generation architectures. Using Stable Video Diffusion as a backbone, Track4Gen demonstrates that it is possible to unify video generation and point tracking, which are typically handled as separate tasks. Our extensive evaluations show that Track4Gen effectively reduces appearance drift, resulting in temporally stable and visually coherent video generation. Project page: hyeonho99.github.io/track4gen

Summary

AI-Generated Summary

PDF202December 12, 2024