基于单参考视角的新颖物体6D姿态估计

摘要

现有的新颖物体6D姿态估计方法通常依赖于CAD模型或密集的参考视图，这两者都难以获取。仅使用单一参考视图虽更具扩展性，但由于存在较大的姿态差异以及几何和空间信息有限，这一方法面临挑战。为解决这些问题，我们提出了一种基于单一参考视图的新颖物体6D姿态估计方法（SinRef-6D）。我们的核心思想是基于状态空间模型（SSMs）在相机坐标系中迭代建立点对点对齐。具体而言，迭代的相机空间点对点对齐能有效处理大姿态差异，而我们提出的RGB和点云SSMs能够从单一视图中捕捉长程依赖关系和空间信息，提供线性复杂度及卓越的空间建模能力。一旦在合成数据上完成预训练，SinRef-6D仅需单一参考视图即可估计新颖物体的6D姿态，无需重新训练或CAD模型。在六个流行数据集及真实世界机器人场景上的大量实验表明，尽管在更具挑战性的单一参考设置下运行，我们的方法仍能达到与基于CAD和密集参考视图方法相当的性能。代码将发布于https://github.com/CNJianLiu/SinRef-6D。

English

Existing novel object 6D pose estimation methods typically rely on CAD models or dense reference views, which are both difficult to acquire. Using only a single reference view is more scalable, but challenging due to large pose discrepancies and limited geometric and spatial information. To address these issues, we propose a Single-Reference-based novel object 6D (SinRef-6D) pose estimation method. Our key idea is to iteratively establish point-wise alignment in the camera coordinate system based on state space models (SSMs). Specifically, iterative camera-space point-wise alignment can effectively handle large pose discrepancies, while our proposed RGB and Points SSMs can capture long-range dependencies and spatial information from a single view, offering linear complexity and superior spatial modeling capability. Once pre-trained on synthetic data, SinRef-6D can estimate the 6D pose of a novel object using only a single reference view, without requiring retraining or a CAD model. Extensive experiments on six popular datasets and real-world robotic scenes demonstrate that we achieve on-par performance with CAD-based and dense reference view-based methods, despite operating in the more challenging single reference setting. Code will be released at https://github.com/CNJianLiu/SinRef-6D.

基于单参考视角的新颖物体6D姿态估计

Novel Object 6D Pose Estimation with a Single Reference View

摘要

Summary

Support

Support