Multimodal large language models (MLLMs) have undergone rapid development in advancing geospatial scene understanding. Recent studies have sought to enhance the reasoning capabilities of remote sensing MLLMs, typically through cold-start training with elaborately curated chain-of-thought (CoT) data. However, this approach not only incurs substantial annotation costs but also introduces human biases that may limit the diversity of model reasoning. To address these challenges, we propose GeoZero, a framework that enables MLLMs to perform geospatial reasoning without any predefined CoT supervision. Specifically, we construct two datasets, GeoZero-Instruct and GeoZero-Hard. GeoZero-Instruct allows the model to acquire preliminary geospatial knowledge through supervised fine-tuning, while GeoZero-Hard stimulates deep reasoning during the subsequent reinforcement learning stage. Furthermore, we introduce Answer-Anchored Group Relative Policy Optimization (A$^2$GRPO), where the reasoning process is regularized by the model's own answers, encouraging diverse yet accurate thinking. Extensive experiments on multiple remote sensing vision-language benchmarks demonstrate that GeoZero not only surpasses existing state-of-the-art methods but also fosters universal emergent reasoning capabilities across diverse geospatial tasks. Code, data, and models will be publicly available at https://github.com/MiliLab/GeoZero.
翻译:多模态大语言模型(MLLMs)在推进地理空间场景理解方面经历了快速发展。近期研究通常通过精心构建的思维链(CoT)数据进行冷启动训练,以增强遥感MLLMs的推理能力。然而,这种方法不仅产生高昂的标注成本,还可能引入人为偏见,从而限制模型推理的多样性。为应对这些挑战,我们提出了GeoZero框架,该框架使MLLMs能够在没有任何预定义CoT监督的情况下执行地理空间推理。具体而言,我们构建了两个数据集:GeoZero-Instruct和GeoZero-Hard。GeoZero-Instruct通过监督微调使模型获取初步地理空间知识,而GeoZero-Hard则在后续强化学习阶段激发深度推理。此外,我们提出了答案锚定分组相对策略优化(A$^2$GRPO),其中推理过程由模型自身的答案进行正则化,从而鼓励多样且准确的思考。在多个遥感视觉语言基准上的大量实验表明,GeoZero不仅超越了现有最先进方法,还能在多样化的地理空间任务中培养通用的涌现推理能力。代码、数据和模型将在 https://github.com/MiliLab/GeoZero 公开提供。