ChatPaper.aiChatPaper

TAPTRv3:空间和时间背景促进长视频中任意点的稳健跟踪

TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video

November 27, 2024
作者: Jinyuan Qu, Hongyang Li, Shilong Liu, Tianhe Ren, Zhaoyang Zeng, Lei Zhang
cs.AI

摘要

本文介绍了TAPTRv3,它是在TAPTRv2的基础上构建的,旨在提高长视频中的点跟踪稳健性。TAPTRv2是一个类似DETR的简单框架,可以准确地在现实世界的视频中跟踪任何点,而无需成本体积。TAPTRv3通过解决TAPTRv2在从长视频中查询高质量特征方面的不足来改进TAPTRv2,在这种视频中,目标跟踪点通常随时间变化而增加。在TAPTRv3中,我们提出利用空间和时间上下文,沿着空间和时间维度进行更强大的特征查询,以实现在长视频中更稳健的跟踪。为了更好地进行空间特征查询,我们提出了上下文感知交叉注意力(CCA),它利用周围的空间上下文来增强在查询图像特征时的注意力分数质量。为了更好地进行时间特征查询,我们引入了可见性感知长时注意力(VLTA),它在考虑其对应可见性的同时,对所有过去帧进行时间注意力,有效地解决了TAPTRv2中由其类似RNN的长时建模带来的特征漂移问题。TAPTRv3在大多数具有挑战性的数据集上远远超过了TAPTRv2,并获得了最先进的性能。即使与使用大规模额外内部数据训练的方法相比,TAPTRv3仍然具有竞争力。
English
In this paper, we present TAPTRv3, which is built upon TAPTRv2 to improve its point tracking robustness in long videos. TAPTRv2 is a simple DETR-like framework that can accurately track any point in real-world videos without requiring cost-volume. TAPTRv3 improves TAPTRv2 by addressing its shortage in querying high quality features from long videos, where the target tracking points normally undergo increasing variation over time. In TAPTRv3, we propose to utilize both spatial and temporal context to bring better feature querying along the spatial and temporal dimensions for more robust tracking in long videos. For better spatial feature querying, we present Context-aware Cross-Attention (CCA), which leverages surrounding spatial context to enhance the quality of attention scores when querying image features. For better temporal feature querying, we introduce Visibility-aware Long-Temporal Attention (VLTA) to conduct temporal attention to all past frames while considering their corresponding visibilities, which effectively addresses the feature drifting problem in TAPTRv2 brought by its RNN-like long-temporal modeling. TAPTRv3 surpasses TAPTRv2 by a large margin on most of the challenging datasets and obtains state-of-the-art performance. Even when compared with methods trained with large-scale extra internal data, TAPTRv3 is still competitive.

Summary

AI-Generated Summary

PDF202December 3, 2024