ChatPaper.aiChatPaper

几何塑造者:基于扩散先验的开放世界视频一致性几何估计

GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors

April 1, 2025
作者: Tian-Xing Xu, Xiangjun Gao, Wenbo Hu, Xiaoyu Li, Song-Hai Zhang, Ying Shan
cs.AI

摘要

尽管视频深度估计领域取得了显著进展,现有方法在通过仿射不变预测实现几何保真度方面存在固有局限,这限制了其在重建及其他基于度量的下游任务中的适用性。我们提出了GeometryCrafter,一个创新框架,能够从开放世界视频中恢复具有时间一致性的高精度点云序列,从而支持精确的3D/4D重建、相机参数估计以及其他基于深度的应用。该框架的核心是一个点云变分自编码器(VAE),它学习了一个与视频潜在分布无关的潜在空间,以实现有效的点云编码与解码。借助VAE,我们训练了一个视频扩散模型,以建模基于输入视频的点云序列分布。在多个数据集上的广泛评估表明,GeometryCrafter在3D精度、时间一致性及泛化能力方面均达到了业界领先水平。
English
Despite remarkable advancements in video depth estimation, existing methods exhibit inherent limitations in achieving geometric fidelity through the affine-invariant predictions, limiting their applicability in reconstruction and other metrically grounded downstream tasks. We propose GeometryCrafter, a novel framework that recovers high-fidelity point map sequences with temporal coherence from open-world videos, enabling accurate 3D/4D reconstruction, camera parameter estimation, and other depth-based applications. At the core of our approach lies a point map Variational Autoencoder (VAE) that learns a latent space agnostic to video latent distributions for effective point map encoding and decoding. Leveraging the VAE, we train a video diffusion model to model the distribution of point map sequences conditioned on the input videos. Extensive evaluations on diverse datasets demonstrate that GeometryCrafter achieves state-of-the-art 3D accuracy, temporal consistency, and generalization capability.

Summary

AI-Generated Summary

PDF292April 2, 2025