類別不可知姿勢估計的邊權重預測
Edge Weight Prediction For Category-Agnostic Pose Estimation
November 25, 2024
作者: Or Hirschorn, Shai Avidan
cs.AI
摘要
類別不可知姿勢估計(CAPE)使用單一模型在各種物件類別中定位關鍵點,僅需一個或少數標註支援影像。最近的研究表明,使用姿勢圖(即將關鍵點視為圖中的節點而非孤立點)有助於處理遮擋和破壞對稱性。然而,這些方法假設靜態姿勢圖具有等權重邊緣,導致結果次優。我們提出EdgeCape,一個新穎的框架,通過預測圖的邊權重來優化定位,從而克服這些限制。為進一步利用結構先驗,我們提出整合馬可夫結構偏差,根據節點之間的跳數調節節點之間的自注意交互作用。我們展示這提高了模型捕捉全局空間依賴性的能力。在包含100個類別和超過20K影像的MP-100基準測試中,EdgeCape在1-shot設置下實現了最先進的結果,在5-shot設置中在相似大小的方法中處於領先地位,顯著提高了關鍵點定位的準確性。我們的程式碼已公開提供。
English
Category-Agnostic Pose Estimation (CAPE) localizes keypoints across diverse
object categories with a single model, using one or a few annotated support
images. Recent works have shown that using a pose graph (i.e., treating
keypoints as nodes in a graph rather than isolated points) helps handle
occlusions and break symmetry. However, these methods assume a static pose
graph with equal-weight edges, leading to suboptimal results. We introduce
EdgeCape, a novel framework that overcomes these limitations by predicting the
graph's edge weights which optimizes localization. To further leverage
structural priors, we propose integrating Markovian Structural Bias, which
modulates the self-attention interaction between nodes based on the number of
hops between them. We show that this improves the model's ability to capture
global spatial dependencies. Evaluated on the MP-100 benchmark, which includes
100 categories and over 20K images, EdgeCape achieves state-of-the-art results
in the 1-shot setting and leads among similar-sized methods in the 5-shot
setting, significantly improving keypoint localization accuracy. Our code is
publicly available.Summary
AI-Generated Summary