Remote sensing change detection aims to perceive changes occurring on the Earth's surface from remote sensing data in different periods, and feed these changes back to humans. However, most existing methods only focus on detecting change regions, lacking the ability to interact with users to identify changes that the users expect. In this paper, we introduce a new task named Change Detection Question Answering and Grounding (CDQAG), which extends the traditional change detection task by providing interpretable textual answers and intuitive visual evidence. To this end, we construct the first CDQAG benchmark dataset, termed QAG-360K, comprising over 360K triplets of questions, textual answers, and corresponding high-quality visual masks. It encompasses 10 essential land-cover categories and 8 comprehensive question types, which provides a large-scale and diverse dataset for remote sensing applications. Based on this, we present VisTA, a simple yet effective baseline method that unifies the tasks of question answering and grounding by delivering both visual and textual answers. Our method achieves state-of-the-art results on both the classic CDVQA and the proposed CDQAG datasets. Extensive qualitative and quantitative experimental results provide useful insights for the development of better CDQAG models, and we hope that our work can inspire further research in this important yet underexplored direction. The proposed benchmark dataset and method are available at https://github.com/like413/VisTA.
翻译:遥感变化检测旨在从不同时期的遥感数据中感知地表发生的变化,并将这些变化反馈给人类。然而,现有方法大多仅关注检测变化区域,缺乏与用户交互以识别其期望变化的能力。本文提出一种名为变化检测问答与定位的新任务,该任务通过提供可解释的文本答案与直观的视觉证据,扩展了传统变化检测的范畴。为此,我们构建了首个CDQAG基准数据集QAG-360K,包含超过36万个由问题、文本答案及对应高质量视觉掩码组成的三元组。数据集涵盖10个关键土地覆盖类别与8种综合问题类型,为遥感应用提供了大规模、多样化的数据基础。基于此,我们提出VisTA——一种简单而有效的基线方法,通过同步输出视觉与文本答案,将问答与定位任务进行统一。该方法在经典CDVQA数据集及本文提出的CDQAG数据集上均取得了最先进的性能。大量定性与定量实验结果为进一步开发更优的CDQAG模型提供了有效参考,我们期望本工作能激发这一重要但尚未充分探索方向的研究。所提出的基准数据集与方法已开源:https://github.com/like413/VisTA。