SAMURAI:適應於零樣本視覺追蹤的Segment Anything模型,並搭載運動感知記憶功能。
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
November 18, 2024
作者: Cheng-Yen Yang, Hsiang-Wei Huang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang
cs.AI
摘要
Segment Anything Model 2(SAM 2)在物件分割任務中表現出色,但在視覺物件追踪方面面臨挑戰,特別是在處理擁擠場景中快速移動或自遮蔽物件時。此外,原始模型中的固定窗口記憶方法並未考慮選擇的記憶品質,以條件化下一幀的圖像特徵,導致視頻中的錯誤傳播。本文介紹了SAMURAI,這是SAM 2的增強改進版本,專門設計用於視覺物件追踪。通過將時間運動線索與提出的運動感知記憶選擇機制相結合,SAMURAI有效地預測物件運動並優化遮罩選擇,實現強大、準確的追踪,無需重新訓練或微調。SAMURAI實時運行,並在各種基準數據集上展示了強大的零樣本性能,展示了其無需微調即可泛化的能力。在評估中,SAMURAI在成功率和精確度方面取得了顯著改善,LaSOT_{ext}上的AUC增益為7.1%,GOT-10k上的AO增益為3.5%。此外,與LaSOT上的完全監督方法相比,它在LaSOT上取得了競爭性結果,突顯了其在複雜追踪場景中的穩健性以及在動態環境中應用於實際應用的潛力。代碼和結果可在https://github.com/yangchris11/samurai找到。
English
The Segment Anything Model 2 (SAM 2) has demonstrated strong performance in
object segmentation tasks but faces challenges in visual object tracking,
particularly when managing crowded scenes with fast-moving or self-occluding
objects. Furthermore, the fixed-window memory approach in the original model
does not consider the quality of memories selected to condition the image
features for the next frame, leading to error propagation in videos. This paper
introduces SAMURAI, an enhanced adaptation of SAM 2 specifically designed for
visual object tracking. By incorporating temporal motion cues with the proposed
motion-aware memory selection mechanism, SAMURAI effectively predicts object
motion and refines mask selection, achieving robust, accurate tracking without
the need for retraining or fine-tuning. SAMURAI operates in real-time and
demonstrates strong zero-shot performance across diverse benchmark datasets,
showcasing its ability to generalize without fine-tuning. In evaluations,
SAMURAI achieves significant improvements in success rate and precision over
existing trackers, with a 7.1% AUC gain on LaSOT_{ext} and a 3.5% AO
gain on GOT-10k. Moreover, it achieves competitive results compared to fully
supervised methods on LaSOT, underscoring its robustness in complex tracking
scenarios and its potential for real-world applications in dynamic
environments. Code and results are available at
https://github.com/yangchris11/samurai.Summary
AI-Generated Summary