Track4Gen: 비디오 포인트 추적 모델을 가르치면 비디오 생성이 향상됩니다.

초록

최근의 기본 비디오 생성기는 시각적으로 풍부한 출력물을 생성하지만 여전히 외관 변화로 인한 문제에 직면하며, 물체가 점차적으로 저하되거나 일관성 없이 프레임 간에 변화하여 시각적 일관성이 깨집니다. 이는 특징 수준에서의 공간 추적에 대한 명시적인 지도가 없기 때문이라고 가정합니다. 저희는 이러한 문제를 해결하기 위해 공간적으로 인식하는 비디오 생성기인 Track4Gen을 제안합니다. Track4Gen은 비디오 확산 손실과 프레임 간의 점 추적을 결합하여 확산 특징에 대한 향상된 공간 지도를 제공합니다. 기존 비디오 생성 아키텍처를 최소한으로 수정하여 비디오 생성 및 점 추적 작업을 단일 네트워크로 병합하는 Track4Gen은 안정적인 비디오 확산을 기반으로 하여 비디오 생성과 점 추적을 통합할 수 있다는 것을 보여줍니다. 일반적으로 별도로 처리되는 비디오 생성 및 점 추적 작업을 통합하는 것이 가능함을 입증하는 Track4Gen의 포괄적인 평가 결과는 외관 변화를 효과적으로 줄이고, 시간적으로 안정적이며 시각적으로 일관된 비디오 생성을 도출합니다. 프로젝트 페이지: hyeonho99.github.io/track4gen

English

While recent foundational video generators produce visually rich output, they still struggle with appearance drift, where objects gradually degrade or change inconsistently across frames, breaking visual coherence. We hypothesize that this is because there is no explicit supervision in terms of spatial tracking at the feature level. We propose Track4Gen, a spatially aware video generator that combines video diffusion loss with point tracking across frames, providing enhanced spatial supervision on the diffusion features. Track4Gen merges the video generation and point tracking tasks into a single network by making minimal changes to existing video generation architectures. Using Stable Video Diffusion as a backbone, Track4Gen demonstrates that it is possible to unify video generation and point tracking, which are typically handled as separate tasks. Our extensive evaluations show that Track4Gen effectively reduces appearance drift, resulting in temporally stable and visually coherent video generation. Project page: hyeonho99.github.io/track4gen

Track4Gen: 비디오 포인트 추적 모델을 가르치면 비디오 생성이 향상됩니다.

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

초록

Summary

Support