DELTA: 모든 비디오에 대한 밀도 효율적인 장거리 3D 추적

초록

모노클 비디오에서 밀도 높은 3D 모션을 추적하는 것은 여전히 어려운 과제입니다, 특히 장기 시퀀스에 걸쳐 픽셀 수준의 정밀도를 목표로 할 때. 우리는 \Approach를 소개합니다. 이는 3D 공간의 모든 픽셀을 효율적으로 추적하여 전체 비디오에서 정확한 모션 추정을 가능케 하는 혁신적인 방법입니다. 저희 방법은 해상도를 낮춘 추적을 위한 합성 글로벌-로컬 어텐션 메커니즘을 활용하며, 고해상도 예측을 위해 트랜스포머 기반 업샘플러를 사용합니다. 계산 효율성이나 희소 추적으로 제한되는 기존 방법과 달리, \Approach는 규모에 걸친 밀도 높은 3D 추적을 제공하며, 이전 방법보다 8배 빠르게 실행되면서 최첨단 정확도를 달성합니다. 또한, 깊이 표현이 추적 성능에 미치는 영향을 탐구하고 로그-깊이를 최적의 선택으로 확인합니다. 포괄적인 실험은 \Approach의 우수성을 입증하며, 2D 및 3D 밀도 높은 추적 작업에서 새로운 최첨단 결과를 달성합니다. 저희 방법은 3D 공간에서 세밀하고 장기적인 모션 추적이 필요한 응용 프로그램에 대한 견고한 솔루션을 제공합니다.

English

Tracking dense 3D motion from monocular videos remains challenging, particularly when aiming for pixel-level precision over long sequences. We introduce \Approach, a novel method that efficiently tracks every pixel in 3D space, enabling accurate motion estimation across entire videos. Our approach leverages a joint global-local attention mechanism for reduced-resolution tracking, followed by a transformer-based upsampler to achieve high-resolution predictions. Unlike existing methods, which are limited by computational inefficiency or sparse tracking, \Approach delivers dense 3D tracking at scale, running over 8x faster than previous methods while achieving state-of-the-art accuracy. Furthermore, we explore the impact of depth representation on tracking performance and identify log-depth as the optimal choice. Extensive experiments demonstrate the superiority of \Approach on multiple benchmarks, achieving new state-of-the-art results in both 2D and 3D dense tracking tasks. Our method provides a robust solution for applications requiring fine-grained, long-term motion tracking in 3D space.

DELTA: 모든 비디오에 대한 밀도 효율적인 장거리 3D 추적

DELTA: Dense Efficient Long-range 3D Tracking for any video

초록

Support