高斯任意性：用于三维生成的交互式点云潜在扩散

摘要

尽管3D内容生成已经取得了显著进展，但现有方法仍然面临着输入格式、潜在空间设计和输出表示方面的挑战。本文介绍了一种新颖的3D生成框架，解决了这些挑战，提供可扩展、高质量的3D生成，采用交互式点云结构的潜在空间。我们的框架采用了一种变分自动编码器（VAE），以多视角姿态的RGB-D（深度）-N（法线）渲染作为输入，使用了一种独特的潜在空间设计，保留了3D形状信息，并结合了级联潜在扩散模型，以改善形状-纹理的解耦。所提出的方法，高斯任意性，支持多模态条件的3D生成，允许点云、标题和单/多视角图像输入。值得注意的是，新提出的潜在空间自然地实现了几何-纹理的解耦，从而实现了3D感知编辑。实验结果表明，我们的方法在多个数据集上的有效性，无论是在文本条件还是图像条件下的3D生成，均优于现有方法。

English

While 3D content generation has advanced significantly, existing methods still face challenges with input formats, latent space design, and output representations. This paper introduces a novel 3D generation framework that addresses these challenges, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space. Our framework employs a Variational Autoencoder (VAE) with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information, and incorporates a cascaded latent diffusion model for improved shape-texture disentanglement. The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single/multi-view image inputs. Notably, the newly proposed latent space naturally enables geometry-texture disentanglement, thus allowing 3D-aware editing. Experimental results demonstrate the effectiveness of our approach on multiple datasets, outperforming existing methods in both text- and image-conditioned 3D generation.

高斯任意性：用于三维生成的交互式点云潜在扩散

GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

摘要

Support