DELTA：适用于任何视频的密集高效长距离3D跟踪

摘要

从单目视频中跟踪密集的3D运动仍然具有挑战性，特别是在追求长序列中像素级精度时。我们引入了一种名为\Approach 的新方法，能够高效地跟踪3D空间中的每个像素，实现整个视频的准确运动估计。我们的方法利用了联合全局-局部注意机制进行降分辨率跟踪，然后通过基于Transformer的上采样器实现高分辨率预测。与现有方法不同，这些方法受到计算效率低下或稀疏跟踪的限制，\Approach 在规模上提供了密集的3D跟踪，比以往方法运行速度快8倍，同时实现了最先进的准确性。此外，我们探讨了深度表示对跟踪性能的影响，并确定对数深度作为最佳选择。大量实验证明了\Approach 在多个基准测试中的优越性，在2D和3D密集跟踪任务中取得了新的最先进结果。我们的方法为需要在3D空间中进行细粒度、长期运动跟踪的应用提供了强大的解决方案。

English

Tracking dense 3D motion from monocular videos remains challenging, particularly when aiming for pixel-level precision over long sequences. We introduce \Approach, a novel method that efficiently tracks every pixel in 3D space, enabling accurate motion estimation across entire videos. Our approach leverages a joint global-local attention mechanism for reduced-resolution tracking, followed by a transformer-based upsampler to achieve high-resolution predictions. Unlike existing methods, which are limited by computational inefficiency or sparse tracking, \Approach delivers dense 3D tracking at scale, running over 8x faster than previous methods while achieving state-of-the-art accuracy. Furthermore, we explore the impact of depth representation on tracking performance and identify log-depth as the optimal choice. Extensive experiments demonstrate the superiority of \Approach on multiple benchmarks, achieving new state-of-the-art results in both 2D and 3D dense tracking tasks. Our method provides a robust solution for applications requiring fine-grained, long-term motion tracking in 3D space.

DELTA：适用于任何视频的密集高效长距离3D跟踪

DELTA: Dense Efficient Long-range 3D Tracking for any video

摘要

Summary

Support

Support