通过溅射引导扩散实现高保真新视角合成

摘要

尽管新视角合成（NVS）领域近期取得了进展，但从单张或稀疏观测中生成高保真视图仍面临重大挑战。现有的基于点云渲染的方法常因渲染误差导致几何失真。而基于扩散模型的方法虽能利用丰富的三维先验知识改善几何结构，却往往存在纹理幻觉问题。本文提出SplatDiff，一种像素点云引导的视频扩散模型，旨在从单张图像合成高保真新视角。具体而言，我们提出了一种对齐合成策略，以实现对目标视角的精确控制及几何一致的视图合成。为缓解纹理幻觉，我们设计了一个纹理桥接模块，通过自适应特征融合实现高保真纹理生成。如此，SplatDiff结合了点云渲染与扩散模型的优势，生成具有一致几何结构和高保真细节的新视角。大量实验验证了SplatDiff在单视图NVS中的领先性能。此外，无需额外训练，SplatDiff在稀疏视图NVS及立体视频转换等多种任务上展现了卓越的零样本性能。

English

Despite recent advances in Novel View Synthesis (NVS), generating high-fidelity views from single or sparse observations remains a significant challenge. Existing splatting-based approaches often produce distorted geometry due to splatting errors. While diffusion-based methods leverage rich 3D priors to achieve improved geometry, they often suffer from texture hallucination. In this paper, we introduce SplatDiff, a pixel-splatting-guided video diffusion model designed to synthesize high-fidelity novel views from a single image. Specifically, we propose an aligned synthesis strategy for precise control of target viewpoints and geometry-consistent view synthesis. To mitigate texture hallucination, we design a texture bridge module that enables high-fidelity texture generation through adaptive feature fusion. In this manner, SplatDiff leverages the strengths of splatting and diffusion to generate novel views with consistent geometry and high-fidelity details. Extensive experiments verify the state-of-the-art performance of SplatDiff in single-view NVS. Additionally, without extra training, SplatDiff shows remarkable zero-shot performance across diverse tasks, including sparse-view NVS and stereo video conversion.

通过溅射引导扩散实现高保真新视角合成

High-Fidelity Novel View Synthesis via Splatting-Guided Diffusion

摘要

Summary

Support