對比式隨機漫步的自監督任意點追蹤
Self-Supervised Any-Point Tracking by Contrastive Random Walks
September 24, 2024
作者: Ayush Shrivastava, Andrew Owens
cs.AI
摘要
我們提出了一種簡單的自監督方法來解決「追蹤任意點(TAP)」問題。我們訓練一個全局匹配轉換器,通過對比隨機遊走來找到視頻中的循環一致軌跡,利用轉換器基於注意力的全局匹配來定義空間-時間圖上的隨機遊走的轉移矩陣。能夠執行「所有對」比較的能力讓模型獲得高空間精度並獲得強對比學習信號,同時避免了許多最近方法的複雜性(如粗到細的匹配)。為此,我們提出了一些設計決策,使全局匹配架構能夠通過自監督訓練,使用循環一致性。例如,我們發現基於轉換器的方法對快捷解決方案敏感,並提出了一種數據擴增方案來解決這個問題。我們的方法在TapVid基準測試中取得了優異表現,勝過了以前的自監督追蹤方法,如DIFT,並與幾種監督方法競爭力相當。
English
We present a simple, self-supervised approach to the Tracking Any Point (TAP)
problem. We train a global matching transformer to find cycle consistent tracks
through video via contrastive random walks, using the transformer's
attention-based global matching to define the transition matrices for a random
walk on a space-time graph. The ability to perform "all pairs" comparisons
between points allows the model to obtain high spatial precision and to obtain
a strong contrastive learning signal, while avoiding many of the complexities
of recent approaches (such as coarse-to-fine matching). To do this, we propose
a number of design decisions that allow global matching architectures to be
trained through self-supervision using cycle consistency. For example, we
identify that transformer-based methods are sensitive to shortcut solutions,
and propose a data augmentation scheme to address them. Our method achieves
strong performance on the TapVid benchmarks, outperforming previous
self-supervised tracking methods, such as DIFT, and is competitive with several
supervised methods.Summary
AI-Generated Summary