SplatFlow:用於3D高斯點陣合成的多視角矯正流模型

SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis

November 25, 2024
作者: Hyojun Go, Byeongjun Park, Jiho Jang, Jin-Young Kim, Soonwoo Kwon, Changick Kim
cs.AI

摘要

基於文本的3D場景生成和編輯具有重要潛力,可通過直觀用戶交互來簡化內容創建。儘管最近的進展利用3D高斯飛濺(3DGS)進行高保真度和實時渲染,現有方法通常專門化且專注於任務,缺乏統一的框架來進行生成和編輯。本文介紹了SplatFlow,這是一個全面的框架,通過實現直接的3DGS生成和編輯來填補這一差距。SplatFlow包括兩個主要組件:多視圖矯正流(RF)模型和高斯飛濺解碼器(GSDecoder)。多視圖RF模型在潛在空間中運行,同時生成多視圖圖像、深度和相機姿勢,並受文本提示條件影響,從而應對現實世界環境中的各種場景尺度和複雜相機軌跡等挑戰。然後,GSDecoder通過前向3DGS方法有效地將這些潛在輸出轉換為3DGS表示。通過利用無需訓練的反演和修補技術,SplatFlow實現了無縫的3DGS編輯,支持廣泛的3D任務,包括對象編輯、新視圖合成和相機姿勢估計,而無需額外的複雜流程。我們在MVImgNet和DL3DV-7K數據集上驗證了SplatFlow的能力,展示了其在各種3D生成、編輯和修補任務中的多功能性和有效性。
English
Text-based generation and editing of 3D scenes hold significant potential for streamlining content creation through intuitive user interactions. While recent advances leverage 3D Gaussian Splatting (3DGS) for high-fidelity and real-time rendering, existing methods are often specialized and task-focused, lacking a unified framework for both generation and editing. In this paper, we introduce SplatFlow, a comprehensive framework that addresses this gap by enabling direct 3DGS generation and editing. SplatFlow comprises two main components: a multi-view rectified flow (RF) model and a Gaussian Splatting Decoder (GSDecoder). The multi-view RF model operates in latent space, generating multi-view images, depths, and camera poses simultaneously, conditioned on text prompts, thus addressing challenges like diverse scene scales and complex camera trajectories in real-world settings. Then, the GSDecoder efficiently translates these latent outputs into 3DGS representations through a feed-forward 3DGS method. Leveraging training-free inversion and inpainting techniques, SplatFlow enables seamless 3DGS editing and supports a broad range of 3D tasks-including object editing, novel view synthesis, and camera pose estimation-within a unified framework without requiring additional complex pipelines. We validate SplatFlow's capabilities on the MVImgNet and DL3DV-7K datasets, demonstrating its versatility and effectiveness in various 3D generation, editing, and inpainting-based tasks.

Summary

AI-Generated Summary

PDF102November 26, 2024