用于可扩展和多功能3D生成的结构化3D潜变量

摘要

我们介绍了一种新颖的三维生成方法，用于多功能且高质量的三维资产创建。其核心是统一的结构化潜在（SLAT）表示，允许解码为不同的输出格式，如辐射场、三维高斯和网格。通过将稀疏填充的三维网格与从强大的视觉基础模型中提取的密集多视图视觉特征相结合，全面捕捉结构（几何）和纹理（外观）信息，同时在解码过程中保持灵活性来实现这一目标。我们采用为SLAT量身定制的矫正流转换器作为我们的三维生成模型，并在一个包含50万多样对象的大型三维资产数据集上训练具有多达20亿参数的模型。我们的模型生成具有文本或图像条件的高质量结果，明显超越了现有方法，包括类似规模的最新方法。我们展示了灵活的输出格式选择和本地三维编辑功能，这是以前模型所没有提供的。代码、模型和数据将会发布。

English

We introduce a novel 3D generation method for versatile and high-quality 3D asset creation. The cornerstone is a unified Structured LATent (SLAT) representation which allows decoding to different output formats, such as Radiance Fields, 3D Gaussians, and meshes. This is achieved by integrating a sparsely-populated 3D grid with dense multiview visual features extracted from a powerful vision foundation model, comprehensively capturing both structural (geometry) and textural (appearance) information while maintaining flexibility during decoding. We employ rectified flow transformers tailored for SLAT as our 3D generation models and train models with up to 2 billion parameters on a large 3D asset dataset of 500K diverse objects. Our model generates high-quality results with text or image conditions, significantly surpassing existing methods, including recent ones at similar scales. We showcase flexible output format selection and local 3D editing capabilities which were not offered by previous models. Code, model, and data will be released.

用于可扩展和多功能3D生成的结构化3D潜变量

Structured 3D Latents for Scalable and Versatile 3D Generation

摘要

Summary

Support

Support