SAMURAI：将“Segment Anything Model”调整为零样本视觉跟踪模型，带有运动感知记忆

摘要

Segment Anything Model 2（SAM 2）在目标分割任务中表现出色，但在视觉目标跟踪方面面临挑战，特别是在处理拥挤场景、快速移动或自遮挡物体时。此外，原始模型中的固定窗口记忆方法并未考虑选择用于调整图像特征的记忆质量，导致视频中的错误传播。本文介绍了SAMURAI，这是SAM 2的增强版本，专门设计用于视觉目标跟踪。通过将时间运动线索与提出的运动感知记忆选择机制相结合，SAMURAI有效地预测目标运动并优化掩模选择，实现了稳健、准确的跟踪，无需重新训练或微调。SAMURAI实时运行，并在各种基准数据集上展现出强大的零样本性能，表明其能够在无需微调的情况下进行泛化。在评估中，SAMURAI在成功率和精度方面取得了显著改进，LaSOT_{ext}上的AUC增益为7.1%，GOT-10k上的AO增益为3.5%。此外，与LaSOT上的完全监督方法相比，它在LaSOT上取得了竞争性结果，突显了其在复杂跟踪场景中的稳健性以及在动态环境中实际应用的潜力。代码和结果可在https://github.com/yangchris11/samurai获取。

English

The Segment Anything Model 2 (SAM 2) has demonstrated strong performance in object segmentation tasks but faces challenges in visual object tracking, particularly when managing crowded scenes with fast-moving or self-occluding objects. Furthermore, the fixed-window memory approach in the original model does not consider the quality of memories selected to condition the image features for the next frame, leading to error propagation in videos. This paper introduces SAMURAI, an enhanced adaptation of SAM 2 specifically designed for visual object tracking. By incorporating temporal motion cues with the proposed motion-aware memory selection mechanism, SAMURAI effectively predicts object motion and refines mask selection, achieving robust, accurate tracking without the need for retraining or fine-tuning. SAMURAI operates in real-time and demonstrates strong zero-shot performance across diverse benchmark datasets, showcasing its ability to generalize without fine-tuning. In evaluations, SAMURAI achieves significant improvements in success rate and precision over existing trackers, with a 7.1% AUC gain on LaSOT_{ext} and a 3.5% AO gain on GOT-10k. Moreover, it achieves competitive results compared to fully supervised methods on LaSOT, underscoring its robustness in complex tracking scenarios and its potential for real-world applications in dynamic environments. Code and results are available at https://github.com/yangchris11/samurai.

SAMURAI：将“Segment Anything Model”调整为零样本视觉跟踪模型，带有运动感知记忆

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

摘要

Summary

Support