VideoScene:一步生成3D场景的视频扩散模型蒸馏技术
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
April 2, 2025
作者: Hanyang Wang, Fangfu Liu, Jiawei Chi, Yueqi Duan
cs.AI
摘要
从稀疏视角中恢复三维场景是一项极具挑战性的任务,因其本质上是一个不适定问题。传统方法已开发出专门解决方案(如几何正则化或前馈确定性模型)以缓解此问题。然而,当输入视角间重叠极少且视觉信息不足时,这些方法仍面临性能下降的困境。幸运的是,近期视频生成模型展现出解决这一挑战的潜力,它们能够生成具有合理三维结构的视频片段。借助大规模预训练视频扩散模型,一些前沿研究开始探索视频生成先验的潜力,并尝试从稀疏视角创建三维场景。尽管取得了显著改进,但这些方法受限于推理速度慢及缺乏三维约束,导致效率低下并产生与真实世界几何结构不符的重建伪影。本文提出VideoScene,通过蒸馏视频扩散模型一步生成三维场景,旨在构建一个高效且有效的工具,弥合视频到三维的鸿沟。具体而言,我们设计了一种三维感知的跳跃流蒸馏策略,以跳过耗时的冗余信息,并训练了一个动态去噪策略网络,在推理过程中自适应地确定最佳跳跃时间步。大量实验表明,我们的VideoScene在三维场景生成上比以往的视频扩散模型更快且效果更优,凸显了其作为未来视频到三维应用高效工具的潜力。项目页面:https://hanyang-21.github.io/VideoScene
English
Recovering 3D scenes from sparse views is a challenging task due to its
inherent ill-posed problem. Conventional methods have developed specialized
solutions (e.g., geometry regularization or feed-forward deterministic model)
to mitigate the issue. However, they still suffer from performance degradation
by minimal overlap across input views with insufficient visual information.
Fortunately, recent video generative models show promise in addressing this
challenge as they are capable of generating video clips with plausible 3D
structures. Powered by large pretrained video diffusion models, some pioneering
research start to explore the potential of video generative prior and create 3D
scenes from sparse views. Despite impressive improvements, they are limited by
slow inference time and the lack of 3D constraint, leading to inefficiencies
and reconstruction artifacts that do not align with real-world geometry
structure. In this paper, we propose VideoScene to distill the video diffusion
model to generate 3D scenes in one step, aiming to build an efficient and
effective tool to bridge the gap from video to 3D. Specifically, we design a
3D-aware leap flow distillation strategy to leap over time-consuming redundant
information and train a dynamic denoising policy network to adaptively
determine the optimal leap timestep during inference. Extensive experiments
demonstrate that our VideoScene achieves faster and superior 3D scene
generation results than previous video diffusion models, highlighting its
potential as an efficient tool for future video to 3D applications. Project
Page: https://hanyang-21.github.io/VideoSceneSummary
AI-Generated Summary