学习抗遮挡视觉Transformer，实现无人机实时跟踪

摘要

近期，采用视觉Transformer（ViT）骨干的单流架构在实时无人机（UAV）跟踪中展现出巨大潜力。然而，建筑物、树木等障碍物频繁造成的遮挡暴露了这类模型的一个主要缺陷：它们往往缺乏有效处理遮挡的策略。因此，亟需新方法来增强单流ViT模型在航空跟踪中的遮挡鲁棒性。在本研究中，我们提出基于ViT学习遮挡鲁棒表示（ORR），通过强制目标特征表示对由空间Cox过程建模的随机掩码操作保持不变性，以期这些随机掩码能近似模拟目标遮挡，从而训练出对目标遮挡具有鲁棒性的ViT模型，用于无人机跟踪。该框架被命名为ORTrack。此外，为促进实时应用，我们提出了一种基于特征的自适应知识蒸馏（AFKD）方法，以创建一个更紧凑的跟踪器，它根据任务难度自适应地模仿教师模型ORTrack的行为。这一学生模型，称为ORTrack-D，在保持ORTrack大部分性能的同时，提供了更高的效率。在多个基准上的广泛实验验证了我们方法的有效性，展示了其最先进的性能。代码可在https://github.com/wuyou3474/ORTrack获取。

English

Single-stream architectures using Vision Transformer (ViT) backbones show great potential for real-time UAV tracking recently. However, frequent occlusions from obstacles like buildings and trees expose a major drawback: these models often lack strategies to handle occlusions effectively. New methods are needed to enhance the occlusion resilience of single-stream ViT models in aerial tracking. In this work, we propose to learn Occlusion-Robust Representations (ORR) based on ViTs for UAV tracking by enforcing an invariance of the feature representation of a target with respect to random masking operations modeled by a spatial Cox process. Hopefully, this random masking approximately simulates target occlusions, thereby enabling us to learn ViTs that are robust to target occlusion for UAV tracking. This framework is termed ORTrack. Additionally, to facilitate real-time applications, we propose an Adaptive Feature-Based Knowledge Distillation (AFKD) method to create a more compact tracker, which adaptively mimics the behavior of the teacher model ORTrack according to the task's difficulty. This student model, dubbed ORTrack-D, retains much of ORTrack's performance while offering higher efficiency. Extensive experiments on multiple benchmarks validate the effectiveness of our method, demonstrating its state-of-the-art performance. Codes is available at https://github.com/wuyou3474/ORTrack.

学习抗遮挡视觉Transformer，实现无人机实时跟踪

Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking

摘要

Summary

Support

Support