通过溅射引导扩散实现高保真新视角合成
High-Fidelity Novel View Synthesis via Splatting-Guided Diffusion
February 18, 2025
作者: Xiang Zhang, Yang Zhang, Lukas Mehl, Markus Gross, Christopher Schroers
cs.AI
摘要
尽管新视角合成(NVS)领域近期取得了进展,但从单张或稀疏观测中生成高保真视图仍面临重大挑战。现有的基于点云渲染的方法常因渲染误差导致几何失真。而基于扩散模型的方法虽能利用丰富的三维先验知识改善几何结构,却往往存在纹理幻觉问题。本文提出SplatDiff,一种像素点云引导的视频扩散模型,旨在从单张图像合成高保真新视角。具体而言,我们提出了一种对齐合成策略,以实现对目标视角的精确控制及几何一致的视图合成。为缓解纹理幻觉,我们设计了一个纹理桥接模块,通过自适应特征融合实现高保真纹理生成。如此,SplatDiff结合了点云渲染与扩散模型的优势,生成具有一致几何结构和高保真细节的新视角。大量实验验证了SplatDiff在单视图NVS中的领先性能。此外,无需额外训练,SplatDiff在稀疏视图NVS及立体视频转换等多种任务上展现了卓越的零样本性能。
English
Despite recent advances in Novel View Synthesis (NVS), generating
high-fidelity views from single or sparse observations remains a significant
challenge. Existing splatting-based approaches often produce distorted geometry
due to splatting errors. While diffusion-based methods leverage rich 3D priors
to achieve improved geometry, they often suffer from texture hallucination. In
this paper, we introduce SplatDiff, a pixel-splatting-guided video diffusion
model designed to synthesize high-fidelity novel views from a single image.
Specifically, we propose an aligned synthesis strategy for precise control of
target viewpoints and geometry-consistent view synthesis. To mitigate texture
hallucination, we design a texture bridge module that enables high-fidelity
texture generation through adaptive feature fusion. In this manner, SplatDiff
leverages the strengths of splatting and diffusion to generate novel views with
consistent geometry and high-fidelity details. Extensive experiments verify the
state-of-the-art performance of SplatDiff in single-view NVS. Additionally,
without extra training, SplatDiff shows remarkable zero-shot performance across
diverse tasks, including sparse-view NVS and stereo video conversion.Summary
AI-Generated Summary