從渲染3D模型中學習強健的物體方向估計

摘要

方向是物體的關鍵屬性，對於理解它們在圖像中的空間姿態和排列至關重要。然而，從單張圖像準確估計方向的實用解決方案仍未得到充分探索。在這項工作中，我們介紹了 Orient Anything，這是第一個專業且基礎的模型，旨在估計單視角和自由視角圖像中的物體方向。由於標記數據稀缺，我們提出從 3D 世界中提取知識。通過開發一個流程來標註 3D 物體的正面並從隨機視角渲染圖像，我們收集了包含精確方向標註的 2M 張圖像。為了充分利用數據集，我們設計了一個強大的訓練目標，將 3D 方向建模為三個角度的概率分佈，並通過擬合這些分佈來預測物體方向。此外，我們採用了多種策略來改善從合成到真實的轉移。我們的模型在渲染和真實圖像中均實現了最先進的方向估計準確性，並展現了在各種情境中令人印象深刻的零樣本能力。更重要的是，我們的模型增強了許多應用，例如理解和生成複雜的空間概念和 3D 物體姿態調整。

English

Orientation is a key attribute of objects, crucial for understanding their spatial pose and arrangement in images. However, practical solutions for accurate orientation estimation from a single image remain underexplored. In this work, we introduce Orient Anything, the first expert and foundational model designed to estimate object orientation in a single- and free-view image. Due to the scarcity of labeled data, we propose extracting knowledge from the 3D world. By developing a pipeline to annotate the front face of 3D objects and render images from random views, we collect 2M images with precise orientation annotations. To fully leverage the dataset, we design a robust training objective that models the 3D orientation as probability distributions of three angles and predicts the object orientation by fitting these distributions. Besides, we employ several strategies to improve synthetic-to-real transfer. Our model achieves state-of-the-art orientation estimation accuracy in both rendered and real images and exhibits impressive zero-shot ability in various scenarios. More importantly, our model enhances many applications, such as comprehension and generation of complex spatial concepts and 3D object pose adjustment.

從渲染3D模型中學習強健的物體方向估計

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

摘要

Support