Track4Gen：教授视频扩散模型跟踪点有助于改进视频生成

摘要

尽管最近的基础视频生成器能够产生视觉丰富的输出，但仍然存在外观漂移问题，即物体逐渐退化或在帧之间不一致地改变，破坏了视觉连贯性。我们假设这是因为在特征级别上缺乏明确的空间跟踪监督。我们提出了Track4Gen，这是一个具有空间感知能力的视频生成器，它将视频扩散损失与跨帧点跟踪相结合，为扩散特征提供增强的空间监督。Track4Gen通过对现有视频生成架构进行最小的更改，将视频生成和点跟踪任务合并到一个网络中。利用稳定视频扩散作为基础，Track4Gen表明将视频生成和点跟踪统一起来是可能的，而这些通常被视为独立任务。我们进行了广泛的评估，结果显示Track4Gen有效减少了外观漂移，实现了时间稳定和视觉连贯的视频生成。项目页面：hyeonho99.github.io/track4gen

English

While recent foundational video generators produce visually rich output, they still struggle with appearance drift, where objects gradually degrade or change inconsistently across frames, breaking visual coherence. We hypothesize that this is because there is no explicit supervision in terms of spatial tracking at the feature level. We propose Track4Gen, a spatially aware video generator that combines video diffusion loss with point tracking across frames, providing enhanced spatial supervision on the diffusion features. Track4Gen merges the video generation and point tracking tasks into a single network by making minimal changes to existing video generation architectures. Using Stable Video Diffusion as a backbone, Track4Gen demonstrates that it is possible to unify video generation and point tracking, which are typically handled as separate tasks. Our extensive evaluations show that Track4Gen effectively reduces appearance drift, resulting in temporally stable and visually coherent video generation. Project page: hyeonho99.github.io/track4gen

Track4Gen：教授视频扩散模型跟踪点有助于改进视频生成

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

摘要

Support