DELTA：針對任何視頻的密集高效長距離3D追蹤

摘要

從單眼視頻中跟踪密集的3D運動仍然具有挑戰性，特別是當追求長序列中像素級精度時。我們引入了一種新方法——\Approach，這種方法能夠有效地跟踪3D空間中的每個像素，從而實現整個視頻的準確運動估計。我們的方法利用了聯合全局-局部注意機制來進行降低分辨率跟踪，然後通過基於Transformer的上採樣器來實現高分辨率預測。與現有方法不同，這些方法受到計算效率低下或稀疏跟踪的限制，\Approach在規模上提供了密集的3D跟踪，運行速度比先前的方法快8倍，同時實現了最先進的準確性。此外，我們探討了深度表示對跟踪性能的影響，並確定對數深度為最佳選擇。大量實驗證明了\Approach在多個基準測試中的優越性，實現了2D和3D密集跟踪任務的最新最先進結果。我們的方法為需要在3D空間中進行細粒度、長期運動跟踪的應用提供了堅固的解決方案。

English

Tracking dense 3D motion from monocular videos remains challenging, particularly when aiming for pixel-level precision over long sequences. We introduce \Approach, a novel method that efficiently tracks every pixel in 3D space, enabling accurate motion estimation across entire videos. Our approach leverages a joint global-local attention mechanism for reduced-resolution tracking, followed by a transformer-based upsampler to achieve high-resolution predictions. Unlike existing methods, which are limited by computational inefficiency or sparse tracking, \Approach delivers dense 3D tracking at scale, running over 8x faster than previous methods while achieving state-of-the-art accuracy. Furthermore, we explore the impact of depth representation on tracking performance and identify log-depth as the optimal choice. Extensive experiments demonstrate the superiority of \Approach on multiple benchmarks, achieving new state-of-the-art results in both 2D and 3D dense tracking tasks. Our method provides a robust solution for applications requiring fine-grained, long-term motion tracking in 3D space.

DELTA：針對任何視頻的密集高效長距離3D追蹤

DELTA: Dense Efficient Long-range 3D Tracking for any video

摘要

Summary

Support

Support