DiffSplat:将图像扩散模型重新用于可扩展的高斯喷洒生成
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation
January 28, 2025
作者: Chenguo Lin, Panwang Pan, Bangbang Yang, Zeming Li, Yadong Mu
cs.AI
摘要
最近在从文本或单个图像生成3D内容方面取得了进展,但面临着高质量3D数据集有限以及来自2D多视角生成的不一致性的挑战。我们引入了DiffSplat,这是一种新颖的3D生成框架,通过驯服大规模文本到图像扩散模型,本地生成3D高斯斑点。它与先前的3D生成模型不同之处在于,在统一模型中有效利用了Web规模的2D先验,同时保持3D一致性。为了启动训练,提出了一个轻量级重建模型,可立即生成用于可扩展数据集整理的多视角高斯斑点网格。结合这些网格上的常规扩散损失,引入了一个3D渲染损失,以促进在任意视角上的3D连贯性。与图像扩散模型的兼容性使得能够将许多图像生成技术无缝地适应到3D领域。大量实验揭示了DiffSplat在文本和图像条件下的生成任务以及下游应用中的优越性。彻底的消融研究验证了每个关键设计选择的有效性,并提供了对基础机制的洞察。
English
Recent advancements in 3D content generation from text or a single image
struggle with limited high-quality 3D datasets and inconsistency from 2D
multi-view generation. We introduce DiffSplat, a novel 3D generative framework
that natively generates 3D Gaussian splats by taming large-scale text-to-image
diffusion models. It differs from previous 3D generative models by effectively
utilizing web-scale 2D priors while maintaining 3D consistency in a unified
model. To bootstrap the training, a lightweight reconstruction model is
proposed to instantly produce multi-view Gaussian splat grids for scalable
dataset curation. In conjunction with the regular diffusion loss on these
grids, a 3D rendering loss is introduced to facilitate 3D coherence across
arbitrary views. The compatibility with image diffusion models enables seamless
adaptions of numerous techniques for image generation to the 3D realm.
Extensive experiments reveal the superiority of DiffSplat in text- and
image-conditioned generation tasks and downstream applications. Thorough
ablation studies validate the efficacy of each critical design choice and
provide insights into the underlying mechanism.Summary
AI-Generated Summary