ChatPaper.aiChatPaper

类别无关姿势估计的边缘权重预测

Edge Weight Prediction For Category-Agnostic Pose Estimation

November 25, 2024
作者: Or Hirschorn, Shai Avidan
cs.AI

摘要

类别无关姿态估计(CAPE)使用单个模型在各种物体类别中定位关键点,仅使用一个或少量带注释的支持图像。最近的研究表明,使用姿态图(即将关键点视为图中的节点而不是孤立点)有助于处理遮挡和破坏对称性。然而,这些方法假设静态姿态图具有等权重边,导致结果不佳。我们引入了EdgeCape,一种通过预测图的边权重来优化定位的新颖框架,以克服这些限制。为了进一步利用结构先验,我们提出集成马尔可夫结构偏差,根据节点之间的跳数调节节点之间的自注意交互。我们展示这样可以提高模型捕捉全局空间依赖性的能力。在包含100个类别和超过20K图像的MP-100基准上评估,EdgeCape在1-shot设置中实现了最先进的结果,并在5-shot设置中领先于类似规模的方法,显著提高了关键点定位准确性。我们的代码已公开发布。
English
Category-Agnostic Pose Estimation (CAPE) localizes keypoints across diverse object categories with a single model, using one or a few annotated support images. Recent works have shown that using a pose graph (i.e., treating keypoints as nodes in a graph rather than isolated points) helps handle occlusions and break symmetry. However, these methods assume a static pose graph with equal-weight edges, leading to suboptimal results. We introduce EdgeCape, a novel framework that overcomes these limitations by predicting the graph's edge weights which optimizes localization. To further leverage structural priors, we propose integrating Markovian Structural Bias, which modulates the self-attention interaction between nodes based on the number of hops between them. We show that this improves the model's ability to capture global spatial dependencies. Evaluated on the MP-100 benchmark, which includes 100 categories and over 20K images, EdgeCape achieves state-of-the-art results in the 1-shot setting and leads among similar-sized methods in the 5-shot setting, significantly improving keypoint localization accuracy. Our code is publicly available.

Summary

AI-Generated Summary

PDF62November 26, 2024