DiMeR：解耦式網格重建模型

摘要

隨著大規模3D數據集的出現，前饋式3D生成模型，如大型重建模型（LRM），已獲得顯著關注並取得了令人矚目的成功。然而，我們觀察到RGB圖像往往會導致訓練目標的衝突，並且缺乏幾何重建所需的清晰度。在本文中，我們重新審視了與網格重建相關的歸納偏置，並引入了DiMeR，這是一種新穎的解耦雙流前饋模型，用於稀疏視角下的網格重建。其核心思想是將輸入和框架解耦為幾何和紋理兩部分，從而根據奧卡姆剃刀原理降低每部分的訓練難度。鑑於法線圖與幾何嚴格一致並能準確捕捉表面變化，我們利用法線圖作為幾何分支的專屬輸入，以降低網絡輸入與輸出之間的複雜性。此外，我們改進了網格提取算法，引入了3D地面真值監督。至於紋理分支，我們使用RGB圖像作為輸入以獲取帶紋理的網格。總體而言，DiMeR在各種任務中展現出強大的能力，包括稀疏視角重建、單圖像到3D以及文本到3D。大量實驗表明，DiMeR顯著優於先前的方法，在GSO和OmniObject3D數據集上的Chamfer Distance提升了超過30%。

English

With the advent of large-scale 3D datasets, feed-forward 3D generative models, such as the Large Reconstruction Model (LRM), have gained significant attention and achieved remarkable success. However, we observe that RGB images often lead to conflicting training objectives and lack the necessary clarity for geometry reconstruction. In this paper, we revisit the inductive biases associated with mesh reconstruction and introduce DiMeR, a novel disentangled dual-stream feed-forward model for sparse-view mesh reconstruction. The key idea is to disentangle both the input and framework into geometry and texture parts, thereby reducing the training difficulty for each part according to the Principle of Occam's Razor. Given that normal maps are strictly consistent with geometry and accurately capture surface variations, we utilize normal maps as exclusive input for the geometry branch to reduce the complexity between the network's input and output. Moreover, we improve the mesh extraction algorithm to introduce 3D ground truth supervision. As for texture branch, we use RGB images as input to obtain the textured mesh. Overall, DiMeR demonstrates robust capabilities across various tasks, including sparse-view reconstruction, single-image-to-3D, and text-to-3D. Numerous experiments show that DiMeR significantly outperforms previous methods, achieving over 30% improvement in Chamfer Distance on the GSO and OmniObject3D dataset.

DiMeR：解耦式網格重建模型

DiMeR: Disentangled Mesh Reconstruction Model

摘要

Summary

Support

Support