80步遊世界:一種生成式方法用於全球視覺地理定位

Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation

December 9, 2024
作者: Nicolas Dufour, David Picard, Vicky Kalogeiton, Loic Landrieu
cs.AI

摘要

全球視覺地理定位預測圖像在地球上的拍攝位置。由於圖像在定位精確度上存在差異,這個任務固有地涉及相當程度的模糊性。然而,現有方法是確定性的,並忽略了這一方面。在本文中,我們旨在縮小傳統地理定位和現代生成方法之間的差距。我們提出了基於擴散和黎曼流匹配的第一個生成式地理定位方法,其中去噪過程直接在地球表面上運作。我們的模型在三個視覺地理定位基準測試中實現了最先進的性能:OpenStreetView-5M、YFCC-100M和iNat21。此外,我們引入了概率視覺地理定位任務,模型預測所有可能位置上的概率分佈,而不是單一點。我們為這個任務引入了新的指標和基準線,展示了我們基於擴散的方法的優勢。代碼和模型將會提供。
English
Global visual geolocation predicts where an image was captured on Earth. Since images vary in how precisely they can be localized, this task inherently involves a significant degree of ambiguity. However, existing approaches are deterministic and overlook this aspect. In this paper, we aim to close the gap between traditional geolocalization and modern generative methods. We propose the first generative geolocation approach based on diffusion and Riemannian flow matching, where the denoising process operates directly on the Earth's surface. Our model achieves state-of-the-art performance on three visual geolocation benchmarks: OpenStreetView-5M, YFCC-100M, and iNat21. In addition, we introduce the task of probabilistic visual geolocation, where the model predicts a probability distribution over all possible locations instead of a single point. We introduce new metrics and baselines for this task, demonstrating the advantages of our diffusion-based approach. Codes and models will be made available.

Summary

AI-Generated Summary

PDF202December 10, 2024