This paper introduces the task of Remote Sensing Copy-Move Question Answering (RSCMQA). Unlike traditional Remote Sensing Visual Question Answering (RSVQA), RSCMQA focuses on interpreting complex tampering scenarios and inferring relationships between objects. Based on the practical needs of national defense security and land resource monitoring, we have developed an accurate and comprehensive global dataset for remote sensing image copy-move question answering, named RS-CMQA-2.1M. These images were collected from 29 different regions across 14 countries. Additionally, we have refined a balanced dataset, RS-CMQA-B, to address the long-standing issue of long-tail data in the remote sensing field. Furthermore, we propose a region-discriminative guided multimodal CMQA model, which enhances the accuracy of answering questions about tampered images by leveraging prompt about the differences and connections between the source and tampered domains. Extensive experiments demonstrate that our method provides a stronger benchmark for RS-CMQA compared to general VQA and RSVQA models. Our dataset and code are available at https://github.com/shenyedepisa/RSCMQA.
翻译:本文提出了遥感复制-移动问答任务。与传统遥感视觉问答不同,RSCMQA专注于解析复杂的篡改场景并推断对象间关系。基于国防安全与国土资源监测的实际需求,我们构建了一个精确且全面的遥感图像复制-移动问答全局数据集RS-CMQA-2.1M。该数据集图像采集自14个国家29个不同区域。此外,我们优化了平衡数据集RS-CMQA-B,以解决遥感领域长期存在的长尾数据分布问题。进一步地,我们提出了一种区域判别引导的多模态CMQA模型,通过利用源域与篡改域间差异与关联的提示信息,提升了对篡改图像问题回答的准确性。大量实验表明,相较于通用VQA与RSVQA模型,我们的方法为RSCMQA提供了更具竞争力的基准。数据集与代码已公开于https://github.com/shenyedepisa/RSCMQA。