高斯任意性:用于三维生成的交互式点云潜在扩散

GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

November 12, 2024
作者: Yushi Lan, Shangchen Zhou, Zhaoyang Lyu, Fangzhou Hong, Shuai Yang, Bo Dai, Xingang Pan, Chen Change Loy
cs.AI

摘要

尽管3D内容生成已经取得了显著进展,但现有方法仍然面临着输入格式、潜在空间设计和输出表示方面的挑战。本文介绍了一种新颖的3D生成框架,解决了这些挑战,提供可扩展、高质量的3D生成,采用交互式点云结构的潜在空间。我们的框架采用了一种变分自动编码器(VAE),以多视角姿态的RGB-D(深度)-N(法线)渲染作为输入,使用了一种独特的潜在空间设计,保留了3D形状信息,并结合了级联潜在扩散模型,以改善形状-纹理的解耦。所提出的方法,高斯任意性,支持多模态条件的3D生成,允许点云、标题和单/多视角图像输入。值得注意的是,新提出的潜在空间自然地实现了几何-纹理的解耦,从而实现了3D感知编辑。实验结果表明,我们的方法在多个数据集上的有效性,无论是在文本条件还是图像条件下的3D生成,均优于现有方法。
English
While 3D content generation has advanced significantly, existing methods still face challenges with input formats, latent space design, and output representations. This paper introduces a novel 3D generation framework that addresses these challenges, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space. Our framework employs a Variational Autoencoder (VAE) with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information, and incorporates a cascaded latent diffusion model for improved shape-texture disentanglement. The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single/multi-view image inputs. Notably, the newly proposed latent space naturally enables geometry-texture disentanglement, thus allowing 3D-aware editing. Experimental results demonstrate the effectiveness of our approach on multiple datasets, outperforming existing methods in both text- and image-conditioned 3D generation.

Summary

AI-Generated Summary

PDF216November 18, 2024