几何塑造者：基于扩散先验的开放世界视频一致性几何估计

摘要

尽管视频深度估计领域取得了显著进展，现有方法在通过仿射不变预测实现几何保真度方面存在固有局限，这限制了其在重建及其他基于度量的下游任务中的适用性。我们提出了GeometryCrafter，一个创新框架，能够从开放世界视频中恢复具有时间一致性的高精度点云序列，从而支持精确的3D/4D重建、相机参数估计以及其他基于深度的应用。该框架的核心是一个点云变分自编码器（VAE），它学习了一个与视频潜在分布无关的潜在空间，以实现有效的点云编码与解码。借助VAE，我们训练了一个视频扩散模型，以建模基于输入视频的点云序列分布。在多个数据集上的广泛评估表明，GeometryCrafter在3D精度、时间一致性及泛化能力方面均达到了业界领先水平。

English

Despite remarkable advancements in video depth estimation, existing methods exhibit inherent limitations in achieving geometric fidelity through the affine-invariant predictions, limiting their applicability in reconstruction and other metrically grounded downstream tasks. We propose GeometryCrafter, a novel framework that recovers high-fidelity point map sequences with temporal coherence from open-world videos, enabling accurate 3D/4D reconstruction, camera parameter estimation, and other depth-based applications. At the core of our approach lies a point map Variational Autoencoder (VAE) that learns a latent space agnostic to video latent distributions for effective point map encoding and decoding. Leveraging the VAE, we train a video diffusion model to model the distribution of point map sequences conditioned on the input videos. Extensive evaluations on diverse datasets demonstrate that GeometryCrafter achieves state-of-the-art 3D accuracy, temporal consistency, and generalization capability.

几何塑造者：基于扩散先验的开放世界视频一致性几何估计

GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors

摘要

Summary

Support

Support