80个时间步行走遍世界:一种生成式全球视觉地理定位方法
Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation
December 9, 2024
作者: Nicolas Dufour, David Picard, Vicky Kalogeiton, Loic Landrieu
cs.AI
摘要
全球视觉地理定位可以预测图像在地球上的拍摄位置。由于图像在定位精度上存在差异,这项任务固有地涉及相当程度的歧义。然而,现有方法是确定性的,忽视了这一方面。在本文中,我们旨在弥合传统地理定位和现代生成方法之间的差距。我们提出了基于扩散和黎曼流匹配的第一种生成地理定位方法,其中去噪过程直接在地球表面上运行。我们的模型在三个视觉地理定位基准测试中取得了最先进的性能:OpenStreetView-5M、YFCC-100M和iNat21。此外,我们引入了概率视觉地理定位任务,模型预测所有可能位置上的概率分布,而不是单个点。我们为这一任务引入了新的度量标准和基准线,展示了我们基于扩散的方法的优势。代码和模型将会提供。
English
Global visual geolocation predicts where an image was captured on Earth.
Since images vary in how precisely they can be localized, this task inherently
involves a significant degree of ambiguity. However, existing approaches are
deterministic and overlook this aspect. In this paper, we aim to close the gap
between traditional geolocalization and modern generative methods. We propose
the first generative geolocation approach based on diffusion and Riemannian
flow matching, where the denoising process operates directly on the Earth's
surface. Our model achieves state-of-the-art performance on three visual
geolocation benchmarks: OpenStreetView-5M, YFCC-100M, and iNat21. In addition,
we introduce the task of probabilistic visual geolocation, where the model
predicts a probability distribution over all possible locations instead of a
single point. We introduce new metrics and baselines for this task,
demonstrating the advantages of our diffusion-based approach. Codes and models
will be made available.Summary
AI-Generated Summary