Track4Gen:教授视频扩散模型跟踪点有助于改进视频生成
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation
December 8, 2024
作者: Hyeonho Jeong, Chun-Hao Paul Huang, Jong Chul Ye, Niloy Mitra, Duygu Ceylan
cs.AI
摘要
尽管最近的基础视频生成器能够产生视觉丰富的输出,但仍然存在外观漂移问题,即物体逐渐退化或在帧之间不一致地改变,破坏了视觉连贯性。我们假设这是因为在特征级别上缺乏明确的空间跟踪监督。我们提出了Track4Gen,这是一个具有空间感知能力的视频生成器,它将视频扩散损失与跨帧点跟踪相结合,为扩散特征提供增强的空间监督。Track4Gen通过对现有视频生成架构进行最小的更改,将视频生成和点跟踪任务合并到一个网络中。利用稳定视频扩散作为基础,Track4Gen表明将视频生成和点跟踪统一起来是可能的,而这些通常被视为独立任务。我们进行了广泛的评估,结果显示Track4Gen有效减少了外观漂移,实现了时间稳定和视觉连贯的视频生成。项目页面:hyeonho99.github.io/track4gen
English
While recent foundational video generators produce visually rich output, they
still struggle with appearance drift, where objects gradually degrade or change
inconsistently across frames, breaking visual coherence. We hypothesize that
this is because there is no explicit supervision in terms of spatial tracking
at the feature level. We propose Track4Gen, a spatially aware video generator
that combines video diffusion loss with point tracking across frames, providing
enhanced spatial supervision on the diffusion features. Track4Gen merges the
video generation and point tracking tasks into a single network by making
minimal changes to existing video generation architectures. Using Stable Video
Diffusion as a backbone, Track4Gen demonstrates that it is possible to unify
video generation and point tracking, which are typically handled as separate
tasks. Our extensive evaluations show that Track4Gen effectively reduces
appearance drift, resulting in temporally stable and visually coherent video
generation. Project page: hyeonho99.github.io/track4genSummary
AI-Generated Summary