카테고리에 중립적인 자세 추정을 위한 가장자리 가중치 예측

초록

카테고리에 중립적인 포즈 추정(Category-Agnostic Pose Estimation, CAPE)은 단일 모델을 사용하여 다양한 객체 카테고리에서 키포인트를 지역화하며, 하나 또는 몇 개의 주석이 달린 지원 이미지를 사용합니다. 최근 연구에서는 포즈 그래프를 사용함으로써(즉, 키포인트를 고립된 점이 아닌 그래프의 노드로 취급함으로써) 가려짐을 다루고 대칭을 깨는 데 도움이 되는 것으로 나타났습니다. 그러나 이러한 방법들은 정적 포즈 그래프를 가정하고 등중량 엣지를 갖는다고 가정하여 최적의 결과를 얻지 못합니다. 본 논문에서는 그래프의 엣지 가중치를 예측하여 지역화를 최적화하는 EdgeCape라는 혁신적인 프레임워크를 제안합니다. 구조적 사전 지식을 더 활용하기 위해, 우리는 Markovian Structural Bias를 통합하는 것을 제안합니다. 이는 노드 사이의 호핑 수에 따라 노드 간의 자기 주의 상호 작용을 조절합니다. 이를 통해 모델이 전역 공간 의존성을 포착하는 능력이 향상된다는 것을 보여줍니다. 100가지 카테고리와 20,000장 이상의 이미지를 포함하는 MP-100 벤치마크에서 평가한 결과, EdgeCape는 1-샷 설정에서 최첨단 결과를 달성하며, 5-샷 설정에서 유사한 크기의 방법들 중 가장 우수한 성과를 보여주어 키포인트 지역화 정확도를 크게 향상시킵니다. 우리의 코드는 공개적으로 이용 가능합니다.

English

Category-Agnostic Pose Estimation (CAPE) localizes keypoints across diverse object categories with a single model, using one or a few annotated support images. Recent works have shown that using a pose graph (i.e., treating keypoints as nodes in a graph rather than isolated points) helps handle occlusions and break symmetry. However, these methods assume a static pose graph with equal-weight edges, leading to suboptimal results. We introduce EdgeCape, a novel framework that overcomes these limitations by predicting the graph's edge weights which optimizes localization. To further leverage structural priors, we propose integrating Markovian Structural Bias, which modulates the self-attention interaction between nodes based on the number of hops between them. We show that this improves the model's ability to capture global spatial dependencies. Evaluated on the MP-100 benchmark, which includes 100 categories and over 20K images, EdgeCape achieves state-of-the-art results in the 1-shot setting and leads among similar-sized methods in the 5-shot setting, significantly improving keypoint localization accuracy. Our code is publicly available.

카테고리에 중립적인 자세 추정을 위한 가장자리 가중치 예측

Edge Weight Prediction For Category-Agnostic Pose Estimation

초록

Support