Geo4D:利用視頻生成器實現幾何四維場景重建
Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction
April 10, 2025
作者: Zeren Jiang, Chuanxia Zheng, Iro Laina, Diane Larlus, Andrea Vedaldi
cs.AI
摘要
我們提出了Geo4D,這是一種將視頻擴散模型重新用於單目動態場景三維重建的方法。通過利用此類視頻模型所捕捉的強大動態先驗,Geo4D僅需使用合成數據進行訓練,便能以零樣本方式良好地泛化到真實數據。Geo4D預測多種互補的幾何模態,即點雲、深度圖和射線圖。它採用了一種新的多模態對齊算法,在推理時對這些模態以及多個滑動窗口進行對齊與融合,從而實現對長視頻的魯棒且精確的四維重建。在多個基準上的廣泛實驗表明,Geo4D顯著超越了包括專為處理動態場景設計的最新方法如MonST3R在內的現有最先進視頻深度估計方法。
English
We introduce Geo4D, a method to repurpose video diffusion models for
monocular 3D reconstruction of dynamic scenes. By leveraging the strong dynamic
prior captured by such video models, Geo4D can be trained using only synthetic
data while generalizing well to real data in a zero-shot manner. Geo4D predicts
several complementary geometric modalities, namely point, depth, and ray maps.
It uses a new multi-modal alignment algorithm to align and fuse these
modalities, as well as multiple sliding windows, at inference time, thus
obtaining robust and accurate 4D reconstruction of long videos. Extensive
experiments across multiple benchmarks show that Geo4D significantly surpasses
state-of-the-art video depth estimation methods, including recent methods such
as MonST3R, which are also designed to handle dynamic scenes.Summary
AI-Generated Summary