学习抗遮挡视觉Transformer,实现无人机实时跟踪
Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking
April 12, 2025
作者: You Wu, Xucheng Wang, Xiangyang Yang, Mengyuan Liu, Dan Zeng, Hengzhou Ye, Shuiwang Li
cs.AI
摘要
近期,采用视觉Transformer(ViT)骨干的单流架构在实时无人机(UAV)跟踪中展现出巨大潜力。然而,建筑物、树木等障碍物频繁造成的遮挡暴露了这类模型的一个主要缺陷:它们往往缺乏有效处理遮挡的策略。因此,亟需新方法来增强单流ViT模型在航空跟踪中的遮挡鲁棒性。在本研究中,我们提出基于ViT学习遮挡鲁棒表示(ORR),通过强制目标特征表示对由空间Cox过程建模的随机掩码操作保持不变性,以期这些随机掩码能近似模拟目标遮挡,从而训练出对目标遮挡具有鲁棒性的ViT模型,用于无人机跟踪。该框架被命名为ORTrack。此外,为促进实时应用,我们提出了一种基于特征的自适应知识蒸馏(AFKD)方法,以创建一个更紧凑的跟踪器,它根据任务难度自适应地模仿教师模型ORTrack的行为。这一学生模型,称为ORTrack-D,在保持ORTrack大部分性能的同时,提供了更高的效率。在多个基准上的广泛实验验证了我们方法的有效性,展示了其最先进的性能。代码可在https://github.com/wuyou3474/ORTrack获取。
English
Single-stream architectures using Vision Transformer (ViT) backbones show
great potential for real-time UAV tracking recently. However, frequent
occlusions from obstacles like buildings and trees expose a major drawback:
these models often lack strategies to handle occlusions effectively. New
methods are needed to enhance the occlusion resilience of single-stream ViT
models in aerial tracking. In this work, we propose to learn Occlusion-Robust
Representations (ORR) based on ViTs for UAV tracking by enforcing an invariance
of the feature representation of a target with respect to random masking
operations modeled by a spatial Cox process. Hopefully, this random masking
approximately simulates target occlusions, thereby enabling us to learn ViTs
that are robust to target occlusion for UAV tracking. This framework is termed
ORTrack. Additionally, to facilitate real-time applications, we propose an
Adaptive Feature-Based Knowledge Distillation (AFKD) method to create a more
compact tracker, which adaptively mimics the behavior of the teacher model
ORTrack according to the task's difficulty. This student model, dubbed
ORTrack-D, retains much of ORTrack's performance while offering higher
efficiency. Extensive experiments on multiple benchmarks validate the
effectiveness of our method, demonstrating its state-of-the-art performance.
Codes is available at https://github.com/wuyou3474/ORTrack.Summary
AI-Generated Summary