將高斯塗抹技術整合至擴散去噪器中,以實現快速且可擴展的單階段影像至三維生成。
Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation
November 21, 2024
作者: Yuanhao Cai, He Zhang, Kai Zhang, Yixun Liang, Mengwei Ren, Fujun Luan, Qing Liu, Soo Ye Kim, Jianming Zhang, Zhifei Zhang, Yuqian Zhou, Zhe Lin, Alan Yuille
cs.AI
摘要
現有的前饋式影像至3D方法主要依賴於2D多視圖擴散模型,無法保證3D一致性。這些方法在改變提示視角時很容易崩潰,主要處理以物體為中心的提示影像。本文提出了一種新型的單階段3D擴散模型,稱為DiffusionGS,用於從單一視角生成物體和場景。DiffusionGS直接在每個時間步輸出3D高斯點雲,以強制視角一致性,並允許模型在任何方向的提示視圖下穩健生成,超越以物體為中心的輸入。此外,為了提高DiffusionGS的能力和泛化能力,我們通過開發場景-物體混合訓練策略來擴大3D訓練數據。實驗表明,我們的方法在生成質量上表現更好(PSNR高2.20 dB,FID低23.25),速度也快了5倍以上(在A100 GPU上約6秒),優於當前最先進的方法。用戶研究和文本至3D應用還顯示了我們方法的實際價值。我們的項目頁面位於https://caiyuanhao1998.github.io/project/DiffusionGS/,展示了視頻和互動生成結果。
English
Existing feed-forward image-to-3D methods mainly rely on 2D multi-view
diffusion models that cannot guarantee 3D consistency. These methods easily
collapse when changing the prompt view direction and mainly handle
object-centric prompt images. In this paper, we propose a novel single-stage 3D
diffusion model, DiffusionGS, for object and scene generation from a single
view. DiffusionGS directly outputs 3D Gaussian point clouds at each timestep to
enforce view consistency and allow the model to generate robustly given prompt
views of any directions, beyond object-centric inputs. Plus, to improve the
capability and generalization ability of DiffusionGS, we scale up 3D training
data by developing a scene-object mixed training strategy. Experiments show
that our method enjoys better generation quality (2.20 dB higher in PSNR and
23.25 lower in FID) and over 5x faster speed (~6s on an A100 GPU) than SOTA
methods. The user study and text-to-3D applications also reveals the practical
values of our method. Our Project page at
https://caiyuanhao1998.github.io/project/DiffusionGS/ shows the video and
interactive generation results.Summary
AI-Generated Summary