ChatPaper.aiChatPaper

基于真实玩家游戏数据的地理定位:大规模数据集与类人推理框架

Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework

February 19, 2025
作者: Zirui Song, Jingpu Yang, Yuan Huang, Jonathan Tonglet, Zeyu Zhang, Tao Cheng, Meng Fang, Iryna Gurevych, Xiuying Chen
cs.AI

摘要

地理定位,即识别图像所在位置的任务,需要复杂的推理能力,对于导航、监控及文化保护至关重要。然而,现有方法往往只能提供粗略、不精确且难以解释的定位结果。这一领域面临的主要挑战在于现有地理定位数据集的质量与规模。这些数据集通常规模较小且为自动构建,导致数据噪声大、任务难度不一致,图像要么过于简单直接暴露答案,要么缺乏足够线索进行可靠推断。为应对这些挑战,我们提出了一套全面的地理定位框架,包含三个核心组件:GeoComp,一个大规模数据集;GeoCoT,一种新颖的推理方法;以及GeoEval,一个评估指标,三者协同设计,旨在解决关键难题并推动地理定位研究的进步。该框架的核心是GeoComp(地理定位竞赛数据集),这是一个从地理定位游戏平台收集的大规模数据集,涉及74万用户历时两年的参与。它包含了2500万条元数据条目和300万个遍布全球大部分地区的地理标记位置,每个位置由人类用户标注了数千至数万次。该数据集提供了多样化的难度级别,便于深入分析,并突显了当前模型的关键不足。基于此数据集,我们提出了地理链式思维(GeoCoT),一种新颖的多步推理框架,旨在增强大型视觉模型(LVMs)在地理定位任务中的推理能力。GeoCoT通过多步过程整合上下文与空间线索,模拟人类地理定位的推理方式,从而提升性能。最后,利用GeoEval指标,我们证明了GeoCoT将地理定位准确率显著提高了高达25%,同时增强了结果的可解释性。
English
Geolocation, the task of identifying an image's location, requires complex reasoning and is crucial for navigation, monitoring, and cultural preservation. However, current methods often produce coarse, imprecise, and non-interpretable localization. A major challenge lies in the quality and scale of existing geolocation datasets. These datasets are typically small-scale and automatically constructed, leading to noisy data and inconsistent task difficulty, with images that either reveal answers too easily or lack sufficient clues for reliable inference. To address these challenges, we introduce a comprehensive geolocation framework with three key components: GeoComp, a large-scale dataset; GeoCoT, a novel reasoning method; and GeoEval, an evaluation metric, collectively designed to address critical challenges and drive advancements in geolocation research. At the core of this framework is GeoComp (Geolocation Competition Dataset), a large-scale dataset collected from a geolocation game platform involving 740K users over two years. It comprises 25 million entries of metadata and 3 million geo-tagged locations spanning much of the globe, with each location annotated thousands to tens of thousands of times by human users. The dataset offers diverse difficulty levels for detailed analysis and highlights key gaps in current models. Building on this dataset, we propose Geographical Chain-of-Thought (GeoCoT), a novel multi-step reasoning framework designed to enhance the reasoning capabilities of Large Vision Models (LVMs) in geolocation tasks. GeoCoT improves performance by integrating contextual and spatial cues through a multi-step process that mimics human geolocation reasoning. Finally, using the GeoEval metric, we demonstrate that GeoCoT significantly boosts geolocation accuracy by up to 25% while enhancing interpretability.

Summary

AI-Generated Summary

PDF42February 21, 2025