SplatFlow:用于3D高斯点喷射的多视图矫正流模型综合
SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis
November 25, 2024
作者: Hyojun Go, Byeongjun Park, Jiho Jang, Jin-Young Kim, Soonwoo Kwon, Changick Kim
cs.AI
摘要
基于文本的3D场景生成和编辑具有显著的潜力,通过直观用户交互可简化内容创作流程。尽管最近的进展利用3D高斯飞溅(3DGS)进行高保真实时渲染,现有方法通常是专门化且任务集中,缺乏统一的框架用于生成和编辑。本文介绍了SplatFlow,这是一个全面的框架,通过实现直接的3DGS生成和编辑来填补这一空白。SplatFlow包括两个主要组件:多视图矫正流(RF)模型和高斯飞溅解码器(GSDecoder)。多视图RF模型在潜在空间中运行,同时生成多视图图像、深度和相机姿态,受文本提示条件影响,从而解决了现实世界设置中不同场景尺度和复杂相机轨迹等挑战。然后,GSDecoder通过前馈3DGS方法将这些潜在输出有效地转换为3DGS表示。利用无需训练的反演和修补技术,SplatFlow实现了无缝的3DGS编辑,并在统一框架内支持广泛的3D任务,包括对象编辑、新视角合成和相机姿态估计,而无需额外复杂的流程。我们在MVImgNet和DL3DV-7K数据集上验证了SplatFlow的能力,展示了其在各种3D生成、编辑和修补任务中的多功能性和有效性。
English
Text-based generation and editing of 3D scenes hold significant potential for
streamlining content creation through intuitive user interactions. While recent
advances leverage 3D Gaussian Splatting (3DGS) for high-fidelity and real-time
rendering, existing methods are often specialized and task-focused, lacking a
unified framework for both generation and editing. In this paper, we introduce
SplatFlow, a comprehensive framework that addresses this gap by enabling direct
3DGS generation and editing. SplatFlow comprises two main components: a
multi-view rectified flow (RF) model and a Gaussian Splatting Decoder
(GSDecoder). The multi-view RF model operates in latent space, generating
multi-view images, depths, and camera poses simultaneously, conditioned on text
prompts, thus addressing challenges like diverse scene scales and complex
camera trajectories in real-world settings. Then, the GSDecoder efficiently
translates these latent outputs into 3DGS representations through a
feed-forward 3DGS method. Leveraging training-free inversion and inpainting
techniques, SplatFlow enables seamless 3DGS editing and supports a broad range
of 3D tasks-including object editing, novel view synthesis, and camera pose
estimation-within a unified framework without requiring additional complex
pipelines. We validate SplatFlow's capabilities on the MVImgNet and DL3DV-7K
datasets, demonstrating its versatility and effectiveness in various 3D
generation, editing, and inpainting-based tasks.Summary
AI-Generated Summary