Kiss3DGen:重新利用图像扩散模型进行3D资产生成
Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation
March 3, 2025
作者: Jiantao Lin, Xin Yang, Meixi Chen, Yingjie Xu, Dongyu Yan, Leyi Wu, Xinli Xu, Lie XU, Shunsi Zhang, Ying-Cong Chen
cs.AI
摘要
扩散模型在生成二维图像方面取得了巨大成功。然而,三维内容生成的质量和泛化能力仍然有限。最先进的方法通常需要大规模的三维资产进行训练,而这些数据难以收集。在本研究中,我们提出了Kiss3DGen(Keep It Simple and Straightforward in 3D Generation,三维生成中的简洁直接框架),这是一个通过重新利用训练有素的二维图像扩散模型进行三维生成、编辑和增强的高效框架。具体而言,我们微调了一个扩散模型以生成“三维捆绑图像”,这是一种由多视角图像及其对应法线图组成的平铺表示。随后,利用法线图重建三维网格,并通过多视角图像提供纹理映射,从而生成完整的三维模型。这一简洁方法有效地将三维生成问题转化为二维图像生成任务,最大限度地利用了预训练扩散模型中的知识。此外,我们展示了Kiss3DGen模型与多种扩散模型技术的兼容性,支持诸如三维编辑、网格与纹理增强等高级功能。通过大量实验,我们验证了该方法的有效性,展示了其高效生成高质量三维模型的能力。
English
Diffusion models have achieved great success in generating 2D images.
However, the quality and generalizability of 3D content generation remain
limited. State-of-the-art methods often require large-scale 3D assets for
training, which are challenging to collect. In this work, we introduce
Kiss3DGen (Keep It Simple and Straightforward in 3D Generation), an efficient
framework for generating, editing, and enhancing 3D objects by repurposing a
well-trained 2D image diffusion model for 3D generation. Specifically, we
fine-tune a diffusion model to generate ''3D Bundle Image'', a tiled
representation composed of multi-view images and their corresponding normal
maps. The normal maps are then used to reconstruct a 3D mesh, and the
multi-view images provide texture mapping, resulting in a complete 3D model.
This simple method effectively transforms the 3D generation problem into a 2D
image generation task, maximizing the utilization of knowledge in pretrained
diffusion models. Furthermore, we demonstrate that our Kiss3DGen model is
compatible with various diffusion model techniques, enabling advanced features
such as 3D editing, mesh and texture enhancement, etc. Through extensive
experiments, we demonstrate the effectiveness of our approach, showcasing its
ability to produce high-quality 3D models efficiently.Summary
AI-Generated Summary