SAMURAI: 움직임 인식 메모리를 활용한 제로샷 시각 추적을 위한 Segment Anything Model 적응

초록

세그먼트 어떠한 모델 2 (SAM 2)은 객체 분할 작업에서 강력한 성능을 보여주었지만, 시각적 객체 추적에서 도전을 겪고 있습니다. 특히, 혼잡한 장면에서 빠르게 움직이거나 자기 가려지는 객체를 관리할 때 어려움을 겪습니다. 게다가, 원본 모델의 고정 창 메모리 접근 방식은 다음 프레임을 위해 이미지 특징을 조건으로 선택된 메모리의 품질을 고려하지 않아 비디오에서 오류 전파를 유발합니다. 본 논문에서는 시각적 객체 추적을 위해 특별히 설계된 SAM 2의 향상된 적응인 SAMURAI를 소개합니다. 제안된 움직임 인식 메모리 선택 메커니즘을 통해 시간적 움직임 단서를 통합함으로써, SAMURAI는 효과적으로 객체 움직임을 예측하고 마스크 선택을 정밀화하여, 재학습이나 세부 조정이 필요 없이 견고하고 정확한 추적을 달성합니다. SAMURAI는 실시간으로 작동하며, 다양한 벤치마크 데이터셋에서 강력한 제로샷 성능을 보여주며, 세세한 조정 없이 일반화할 수 있는 능력을 강조합니다. 평가에서, SAMURAI는 기존 추적기에 비해 성공률과 정밀도에서 상당한 향상을 이루어내며, LaSOT_{ext}에서 7.1%의 AUC 향상과 GOT-10k에서 3.5%의 AO 향상을 달성합니다. 더불어, LaSOT에서 완전 지도 방법과 경쟁력 있는 결과를 달성함으로써, 복잡한 추적 시나리오에서의 견고성과 동적 환경에서의 실제 응용 가능성을 강조합니다. 코드 및 결과는 https://github.com/yangchris11/samurai에서 확인할 수 있습니다.

English

The Segment Anything Model 2 (SAM 2) has demonstrated strong performance in object segmentation tasks but faces challenges in visual object tracking, particularly when managing crowded scenes with fast-moving or self-occluding objects. Furthermore, the fixed-window memory approach in the original model does not consider the quality of memories selected to condition the image features for the next frame, leading to error propagation in videos. This paper introduces SAMURAI, an enhanced adaptation of SAM 2 specifically designed for visual object tracking. By incorporating temporal motion cues with the proposed motion-aware memory selection mechanism, SAMURAI effectively predicts object motion and refines mask selection, achieving robust, accurate tracking without the need for retraining or fine-tuning. SAMURAI operates in real-time and demonstrates strong zero-shot performance across diverse benchmark datasets, showcasing its ability to generalize without fine-tuning. In evaluations, SAMURAI achieves significant improvements in success rate and precision over existing trackers, with a 7.1% AUC gain on LaSOT_{ext} and a 3.5% AO gain on GOT-10k. Moreover, it achieves competitive results compared to fully supervised methods on LaSOT, underscoring its robustness in complex tracking scenarios and its potential for real-world applications in dynamic environments. Code and results are available at https://github.com/yangchris11/samurai.

SAMURAI: 움직임 인식 메모리를 활용한 제로샷 시각 추적을 위한 Segment Anything Model 적응

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

초록

Support