TAPIP3D:持久三维几何中的任意点追踪
TAPIP3D: Tracking Any Point in Persistent 3D Geometry
April 20, 2025
作者: Bowei Zhang, Lei Ke, Adam W. Harley, Katerina Fragkiadaki
cs.AI
摘要
我们提出了一种名为TAPIP3D的创新方法,用于单目RGB和RGB-D视频中的长期3D点跟踪。TAPIP3D将视频表示为相机稳定的时空特征云,利用深度和相机运动信息将2D视频特征提升到3D世界空间中,从而有效消除相机运动的影响。在这一稳定表示中,TAPIP3D迭代优化多帧3D运动估计,实现了长时间内的鲁棒跟踪。为了应对3D点分布固有的不规则性,我们提出了一种局部对注意力机制。这一3D上下文策略有效利用了3D空间关系,构建了信息丰富的特征邻域,以实现精确的3D轨迹估计。我们的以3D为中心的方法显著超越了现有的3D点跟踪技术,甚至在精确深度信息可用时,相比传统的2D像素跟踪器,还提升了2D跟踪的准确性。该方法支持在相机坐标系(即未稳定)和世界坐标系中进行推理,我们的结果表明,补偿相机运动能提升跟踪性能。我们的方法取代了先前2D和3D跟踪器中使用的传统2D方形相关邻域,从而在各种3D点跟踪基准测试中取得了更为鲁棒和准确的结果。项目页面:https://tapip3d.github.io
English
We introduce TAPIP3D, a novel approach for long-term 3D point tracking in
monocular RGB and RGB-D videos. TAPIP3D represents videos as camera-stabilized
spatio-temporal feature clouds, leveraging depth and camera motion information
to lift 2D video features into a 3D world space where camera motion is
effectively canceled. TAPIP3D iteratively refines multi-frame 3D motion
estimates within this stabilized representation, enabling robust tracking over
extended periods. To manage the inherent irregularities of 3D point
distributions, we propose a Local Pair Attention mechanism. This 3D
contextualization strategy effectively exploits spatial relationships in 3D,
forming informative feature neighborhoods for precise 3D trajectory estimation.
Our 3D-centric approach significantly outperforms existing 3D point tracking
methods and even enhances 2D tracking accuracy compared to conventional 2D
pixel trackers when accurate depth is available. It supports inference in both
camera coordinates (i.e., unstabilized) and world coordinates, and our results
demonstrate that compensating for camera motion improves tracking performance.
Our approach replaces the conventional 2D square correlation neighborhoods used
in prior 2D and 3D trackers, leading to more robust and accurate results across
various 3D point tracking benchmarks. Project Page: https://tapip3d.github.ioSummary
AI-Generated Summary