VideoScene：一步生成3D场景的视频扩散模型蒸馏技术

摘要

从稀疏视角中恢复三维场景是一项极具挑战性的任务，因其本质上是一个不适定问题。传统方法已开发出专门解决方案（如几何正则化或前馈确定性模型）以缓解此问题。然而，当输入视角间重叠极少且视觉信息不足时，这些方法仍面临性能下降的困境。幸运的是，近期视频生成模型展现出解决这一挑战的潜力，它们能够生成具有合理三维结构的视频片段。借助大规模预训练视频扩散模型，一些前沿研究开始探索视频生成先验的潜力，并尝试从稀疏视角创建三维场景。尽管取得了显著改进，但这些方法受限于推理速度慢及缺乏三维约束，导致效率低下并产生与真实世界几何结构不符的重建伪影。本文提出VideoScene，通过蒸馏视频扩散模型一步生成三维场景，旨在构建一个高效且有效的工具，弥合视频到三维的鸿沟。具体而言，我们设计了一种三维感知的跳跃流蒸馏策略，以跳过耗时的冗余信息，并训练了一个动态去噪策略网络，在推理过程中自适应地确定最佳跳跃时间步。大量实验表明，我们的VideoScene在三维场景生成上比以往的视频扩散模型更快且效果更优，凸显了其作为未来视频到三维应用高效工具的潜力。项目页面：https://hanyang-21.github.io/VideoScene

English

Recovering 3D scenes from sparse views is a challenging task due to its inherent ill-posed problem. Conventional methods have developed specialized solutions (e.g., geometry regularization or feed-forward deterministic model) to mitigate the issue. However, they still suffer from performance degradation by minimal overlap across input views with insufficient visual information. Fortunately, recent video generative models show promise in addressing this challenge as they are capable of generating video clips with plausible 3D structures. Powered by large pretrained video diffusion models, some pioneering research start to explore the potential of video generative prior and create 3D scenes from sparse views. Despite impressive improvements, they are limited by slow inference time and the lack of 3D constraint, leading to inefficiencies and reconstruction artifacts that do not align with real-world geometry structure. In this paper, we propose VideoScene to distill the video diffusion model to generate 3D scenes in one step, aiming to build an efficient and effective tool to bridge the gap from video to 3D. Specifically, we design a 3D-aware leap flow distillation strategy to leap over time-consuming redundant information and train a dynamic denoising policy network to adaptively determine the optimal leap timestep during inference. Extensive experiments demonstrate that our VideoScene achieves faster and superior 3D scene generation results than previous video diffusion models, highlighting its potential as an efficient tool for future video to 3D applications. Project Page: https://hanyang-21.github.io/VideoScene

VideoScene：一步生成3D场景的视频扩散模型蒸馏技术

VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

摘要

Summary

Support

Support