几何塑造者:基于扩散先验的开放世界视频一致性几何估计
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors
April 1, 2025
作者: Tian-Xing Xu, Xiangjun Gao, Wenbo Hu, Xiaoyu Li, Song-Hai Zhang, Ying Shan
cs.AI
摘要
尽管视频深度估计领域取得了显著进展,现有方法在通过仿射不变预测实现几何保真度方面存在固有局限,这限制了其在重建及其他基于度量的下游任务中的适用性。我们提出了GeometryCrafter,一个创新框架,能够从开放世界视频中恢复具有时间一致性的高精度点云序列,从而支持精确的3D/4D重建、相机参数估计以及其他基于深度的应用。该框架的核心是一个点云变分自编码器(VAE),它学习了一个与视频潜在分布无关的潜在空间,以实现有效的点云编码与解码。借助VAE,我们训练了一个视频扩散模型,以建模基于输入视频的点云序列分布。在多个数据集上的广泛评估表明,GeometryCrafter在3D精度、时间一致性及泛化能力方面均达到了业界领先水平。
English
Despite remarkable advancements in video depth estimation, existing methods
exhibit inherent limitations in achieving geometric fidelity through the
affine-invariant predictions, limiting their applicability in reconstruction
and other metrically grounded downstream tasks. We propose GeometryCrafter, a
novel framework that recovers high-fidelity point map sequences with temporal
coherence from open-world videos, enabling accurate 3D/4D reconstruction,
camera parameter estimation, and other depth-based applications. At the core of
our approach lies a point map Variational Autoencoder (VAE) that learns a
latent space agnostic to video latent distributions for effective point map
encoding and decoding. Leveraging the VAE, we train a video diffusion model to
model the distribution of point map sequences conditioned on the input videos.
Extensive evaluations on diverse datasets demonstrate that GeometryCrafter
achieves state-of-the-art 3D accuracy, temporal consistency, and generalization
capability.Summary
AI-Generated Summary