从渲染3D模型中学习稳健的物体方向估计

摘要

方向是物体的关键属性，对于理解其在图像中的空间姿势和排列至关重要。然而，从单张图像准确估计方向的实际解决方案仍未得到充分探讨。在这项工作中，我们介绍了Orient Anything，这是第一个专业且基础的模型，旨在估计单视角和自由视角图像中的物体方向。由于标记数据稀缺，我们提出从3D世界中提取知识。通过开发一个流程来注释3D物体的正面并从随机视角渲染图像，我们收集了200万张带有精确方向注释的图像。为充分利用数据集，我们设计了一个强大的训练目标，将3D方向建模为三个角度的概率分布，并通过拟合这些分布来预测物体方向。此外，我们采用了几种策略来改善从合成到真实的转移。我们的模型在渲染和真实图像中均实现了最先进的方向估计准确性，并展现了在各种场景中令人印象深刻的零样本能力。更重要的是，我们的模型增强了许多应用，例如理解和生成复杂的空间概念以及3D物体姿势调整。

English

Orientation is a key attribute of objects, crucial for understanding their spatial pose and arrangement in images. However, practical solutions for accurate orientation estimation from a single image remain underexplored. In this work, we introduce Orient Anything, the first expert and foundational model designed to estimate object orientation in a single- and free-view image. Due to the scarcity of labeled data, we propose extracting knowledge from the 3D world. By developing a pipeline to annotate the front face of 3D objects and render images from random views, we collect 2M images with precise orientation annotations. To fully leverage the dataset, we design a robust training objective that models the 3D orientation as probability distributions of three angles and predicts the object orientation by fitting these distributions. Besides, we employ several strategies to improve synthetic-to-real transfer. Our model achieves state-of-the-art orientation estimation accuracy in both rendered and real images and exhibits impressive zero-shot ability in various scenarios. More importantly, our model enhances many applications, such as comprehension and generation of complex spatial concepts and 3D object pose adjustment.

从渲染3D模型中学习稳健的物体方向估计

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

摘要

Summary

Support