Geo4D:利用视频生成器实现几何四维场景重建
Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction
April 10, 2025
作者: Zeren Jiang, Chuanxia Zheng, Iro Laina, Diane Larlus, Andrea Vedaldi
cs.AI
摘要
我们提出了Geo4D,一种将视频扩散模型重新应用于动态场景单目三维重建的方法。通过利用此类视频模型所捕捉的强大动态先验,Geo4D仅需使用合成数据进行训练,便能以零样本方式良好地泛化至真实数据。Geo4D预测了多种互补的几何模态,即点云、深度图和射线图。在推理阶段,它采用了一种新颖的多模态对齐算法来对齐并融合这些模态,以及多个滑动窗口,从而实现对长视频的鲁棒且精确的四维重建。跨多个基准的大量实验表明,Geo4D显著超越了包括专为处理动态场景设计的MonST3R在内的最新视频深度估计方法。
English
We introduce Geo4D, a method to repurpose video diffusion models for
monocular 3D reconstruction of dynamic scenes. By leveraging the strong dynamic
prior captured by such video models, Geo4D can be trained using only synthetic
data while generalizing well to real data in a zero-shot manner. Geo4D predicts
several complementary geometric modalities, namely point, depth, and ray maps.
It uses a new multi-modal alignment algorithm to align and fuse these
modalities, as well as multiple sliding windows, at inference time, thus
obtaining robust and accurate 4D reconstruction of long videos. Extensive
experiments across multiple benchmarks show that Geo4D significantly surpasses
state-of-the-art video depth estimation methods, including recent methods such
as MonST3R, which are also designed to handle dynamic scenes.Summary
AI-Generated Summary