Referring Remote Sensing Image Segmentation (RRSIS) aims to segment instances in remote sensing images according to referring expressions. Unlike Referring Image Segmentation on general images, acquiring high-quality referring expressions in the remote sensing domain is particularly challenging due to the prevalence of small, densely distributed objects and complex backgrounds. This paper introduces a new learning paradigm, Weakly Referring Expression Learning (WREL) for RRSIS, which leverages abundant class names as weakly referring expressions together with a small set of accurate ones to enable efficient training under limited annotation conditions. Furthermore, we provide a theoretical analysis showing that mixed-referring training yields a provable upper bound on the performance gap relative to training with fully annotated referring expressions, thereby establishing the validity of this new setting. We also propose LRB-WREL, which integrates a Learnable Reference Bank (LRB) to refine weakly referring expressions through sample-specific prompt embeddings that enrich coarse class-name inputs. Combined with a teacher-student optimization framework using dynamically scheduled EMA updates, LRB-WREL stabilizes training and enhances cross-modal generalization under noisy weakly referring supervision. Extensive experiments on our newly constructed benchmark with varying weakly referring data ratios validate both the theoretical insights and the practical effectiveness of WREL and LRB-WREL, demonstrating that they can approach or even surpass models trained with fully annotated referring expressions.
翻译:遥感图像指代分割(RRSIS)旨在根据指代表达式对遥感图像中的实例进行分割。与通用图像的指代分割不同,由于遥感图像中普遍存在分布密集的小目标及复杂背景,获取高质量的指代表达式尤为困难。本文提出一种新的学习范式——弱指代表达学习(WREL),用于解决RRSIS问题。该方法利用丰富的类别名称作为弱指代表达,结合少量精确表达,在有限标注条件下实现高效训练。进一步,我们通过理论分析证明:混合指代训练所产生的性能差距相对于全标注指代表达训练存在可证明的上界,从而确立了这一新设置的有效性。我们还提出了LRB-WREL方法,该方法通过集成可学习参考库(LRB),利用样本特定的提示嵌入来优化弱指代表达,从而丰富粗粒度的类别名称输入。结合采用动态调度指数移动平均更新的师生优化框架,LRB-WREL在噪声弱指代监督下稳定了训练过程,并增强了跨模态泛化能力。我们在新构建的、包含不同弱指代数据比例的基准数据集上进行了大量实验,验证了WREL与LRB-WREL的理论合理性与实际有效性,结果表明它们能够达到甚至超越使用全标注指代表达训练模型的性能。