結構化的3D潛在空間用於可擴展和多功能的3D生成
Structured 3D Latents for Scalable and Versatile 3D Generation
December 2, 2024
作者: Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, Jiaolong Yang
cs.AI
摘要
我們提出了一種新穎的3D生成方法,用於多功能和高質量的3D資產創建。其核心是統一的結構化潛在(SLAT)表示,允許解碼為不同的輸出格式,如輻射場、3D高斯和網格。通過將稀疏填充的3D網格與從強大的視覺基礎模型中提取的密集多視圖視覺特徵集成在一起,全面捕獲結構(幾何)和紋理(外觀)信息,同時在解碼過程中保持靈活性。我們使用針對SLAT定制的矯正流轉換器作為我們的3D生成模型,並在包含50萬個多樣對象的大型3D資產數據集上訓練具有多達20億個參數的模型。我們的模型生成具有文本或圖像條件的高質量結果,明顯超越現有方法,包括最近在類似規模上的方法。我們展示了靈活的輸出格式選擇和本地3D編輯功能,這是以前模型所不具備的。代碼、模型和數據將會釋出。
English
We introduce a novel 3D generation method for versatile and high-quality 3D
asset creation. The cornerstone is a unified Structured LATent (SLAT)
representation which allows decoding to different output formats, such as
Radiance Fields, 3D Gaussians, and meshes. This is achieved by integrating a
sparsely-populated 3D grid with dense multiview visual features extracted from
a powerful vision foundation model, comprehensively capturing both structural
(geometry) and textural (appearance) information while maintaining flexibility
during decoding. We employ rectified flow transformers tailored for SLAT as our
3D generation models and train models with up to 2 billion parameters on a
large 3D asset dataset of 500K diverse objects. Our model generates
high-quality results with text or image conditions, significantly surpassing
existing methods, including recent ones at similar scales. We showcase flexible
output format selection and local 3D editing capabilities which were not
offered by previous models. Code, model, and data will be released.Summary
AI-Generated Summary