Localizing desired objects from remote sensing images is of great use in practical applications. Referring image segmentation, which aims at segmenting out the objects to which a given expression refers, has been extensively studied in natural images. However, almost no research attention is given to this task of remote sensing imagery. Considering its potential for real-world applications, in this paper, we introduce referring remote sensing image segmentation (RRSIS) to fill in this gap and make some insightful explorations. Specifically, we create a new dataset, called RefSegRS, for this task, enabling us to evaluate different methods. Afterward, we benchmark referring image segmentation methods of natural images on the RefSegRS dataset and find that these models show limited efficacy in detecting small and scattered objects. To alleviate this issue, we propose a language-guided cross-scale enhancement (LGCE) module that utilizes linguistic features to adaptively enhance multi-scale visual features by integrating both deep and shallow features. The proposed dataset, benchmarking results, and the designed LGCE module provide insights into the design of a better RRSIS model. We will make our dataset and code publicly available.
翻译:从遥感图像中定位感兴趣目标在实际应用中具有重要价值。指代图像分割旨在分割出给定文本描述所指代的目标,已在自然图像领域得到广泛研究。然而,遥感图像领域几乎未见针对此任务的研究。考虑到其实际应用潜力,本文引入遥感图像指代分割(RRSIS)以填补这一空白,并进行深入探索。具体而言,我们为该任务创建了新数据集RefSegRS,用于评估不同方法。随后,我们在RefSegRS数据集上对自然图像的指代分割方法进行基准测试,发现这些模型在检测小目标和分散目标时效果有限。为解决该问题,我们提出语言引导的跨尺度增强(LGCE)模块,该模块利用语言特征通过融合深层与浅层特征自适应增强多尺度视觉特征。本文提出的数据集、基准测试结果及设计的LGCE模块为构建更优的RRSIS模型提供了启示。我们的数据集与代码将公开发布。