SAMURAI:将“Segment Anything Model”调整为零样本视觉跟踪模型,带有运动感知记忆
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
November 18, 2024
作者: Cheng-Yen Yang, Hsiang-Wei Huang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang
cs.AI
摘要
Segment Anything Model 2(SAM 2)在目标分割任务中表现出色,但在视觉目标跟踪方面面临挑战,特别是在处理拥挤场景、快速移动或自遮挡物体时。此外,原始模型中的固定窗口记忆方法并未考虑选择用于调整图像特征的记忆质量,导致视频中的错误传播。本文介绍了SAMURAI,这是SAM 2的增强版本,专门设计用于视觉目标跟踪。通过将时间运动线索与提出的运动感知记忆选择机制相结合,SAMURAI有效地预测目标运动并优化掩模选择,实现了稳健、准确的跟踪,无需重新训练或微调。SAMURAI实时运行,并在各种基准数据集上展现出强大的零样本性能,表明其能够在无需微调的情况下进行泛化。在评估中,SAMURAI在成功率和精度方面取得了显著改进,LaSOT_{ext}上的AUC增益为7.1%,GOT-10k上的AO增益为3.5%。此外,与LaSOT上的完全监督方法相比,它在LaSOT上取得了竞争性结果,突显了其在复杂跟踪场景中的稳健性以及在动态环境中实际应用的潜力。代码和结果可在https://github.com/yangchris11/samurai获取。
English
The Segment Anything Model 2 (SAM 2) has demonstrated strong performance in
object segmentation tasks but faces challenges in visual object tracking,
particularly when managing crowded scenes with fast-moving or self-occluding
objects. Furthermore, the fixed-window memory approach in the original model
does not consider the quality of memories selected to condition the image
features for the next frame, leading to error propagation in videos. This paper
introduces SAMURAI, an enhanced adaptation of SAM 2 specifically designed for
visual object tracking. By incorporating temporal motion cues with the proposed
motion-aware memory selection mechanism, SAMURAI effectively predicts object
motion and refines mask selection, achieving robust, accurate tracking without
the need for retraining or fine-tuning. SAMURAI operates in real-time and
demonstrates strong zero-shot performance across diverse benchmark datasets,
showcasing its ability to generalize without fine-tuning. In evaluations,
SAMURAI achieves significant improvements in success rate and precision over
existing trackers, with a 7.1% AUC gain on LaSOT_{ext} and a 3.5% AO
gain on GOT-10k. Moreover, it achieves competitive results compared to fully
supervised methods on LaSOT, underscoring its robustness in complex tracking
scenarios and its potential for real-world applications in dynamic
environments. Code and results are available at
https://github.com/yangchris11/samurai.Summary
AI-Generated Summary