GaussianAnything:用於3D生成的交互式點雲潛在擴散

GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

November 12, 2024
作者: Yushi Lan, Shangchen Zhou, Zhaoyang Lyu, Fangzhou Hong, Shuai Yang, Bo Dai, Xingang Pan, Chen Change Loy
cs.AI

摘要

儘管3D內容生成已取得顯著進展,現有方法仍面臨著輸入格式、潛在空間設計和輸出表示方面的挑戰。本文介紹了一種新穎的3D生成框架,解決了這些挑戰,提供可擴展、高質量的3D生成,並具有交互式的點雲結構潛在空間。我們的框架採用變分自編碼器(VAE),將多視角的RGB-D(epth)-N(ormal)渲染作為輸入,使用獨特的潛在空間設計來保留3D形狀信息,並結合級聯潛在擴散模型以改善形狀-紋理解耦。所提出的方法,稱為高斯任意性(GaussianAnything),支持多模態條件下的3D生成,允許點雲、標題以及單視角/多視角圖像輸入。值得注意的是,新提出的潛在空間自然地實現了幾何-紋理解耦,從而實現了3D感知編輯。實驗結果展示了我們方法在多個數據集上的有效性,優於現有方法在文本和圖像條件下的3D生成。
English
While 3D content generation has advanced significantly, existing methods still face challenges with input formats, latent space design, and output representations. This paper introduces a novel 3D generation framework that addresses these challenges, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space. Our framework employs a Variational Autoencoder (VAE) with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information, and incorporates a cascaded latent diffusion model for improved shape-texture disentanglement. The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single/multi-view image inputs. Notably, the newly proposed latent space naturally enables geometry-texture disentanglement, thus allowing 3D-aware editing. Experimental results demonstrate the effectiveness of our approach on multiple datasets, outperforming existing methods in both text- and image-conditioned 3D generation.

Summary

AI-Generated Summary

PDF216November 18, 2024