Salient object detection (SOD) in remote sensing images faces significant challenges due to large variations in object sizes, the computational cost of self-attention mechanisms, and the limitations of CNN-based extractors in capturing global context and long-range dependencies. Existing methods that rely on fixed convolution kernels often struggle to adapt to diverse object scales, leading to detail loss or irrelevant feature aggregation. To address these issues, this work aims to enhance robustness to scale variations and achieve precise object localization. We propose the Region Proportion-Aware Dynamic Adaptive Salient Object Detection Network (RDNet), which replaces the CNN backbone with the SwinTransformer for global context modeling and introduces three key modules: (1) the Dynamic Adaptive Detail-aware (DAD) module, which applies varied convolution kernels guided by object region proportions; (2) the Frequency-matching Context Enhancement (FCE) module, which enriches contextual information through wavelet interactions and attention; and (3) the Region Proportion-aware Localization (RPL) module, which employs cross-attention to highlight semantic details and integrates a Proportion Guidance (PG) block to assist the DAD module. By combining these modules, RDNet achieves robustness against scale variations and accurate localization, delivering superior detection performance compared with state-of-the-art methods.
翻译:遥感图像中的显著目标检测面临显著挑战,这主要源于目标尺寸的巨大差异、自注意力机制的计算成本,以及基于CNN的提取器在捕获全局上下文和长程依赖关系方面的局限性。依赖固定卷积核的现有方法往往难以适应多样的目标尺度,导致细节丢失或不相关特征聚合。为解决这些问题,本工作旨在增强对尺度变化的鲁棒性并实现精确的目标定位。我们提出了区域比例感知的动态自适应显著目标检测网络(RDNet),它使用SwinTransformer替代CNN骨干网络以进行全局上下文建模,并引入了三个关键模块:(1) 动态自适应细节感知模块,该模块根据目标区域比例引导应用不同的卷积核;(2) 频率匹配上下文增强模块,通过小波交互和注意力机制来丰富上下文信息;(3) 区域比例感知定位模块,该模块利用交叉注意力来突出语义细节,并集成了比例引导块以辅助DAD模块。通过结合这些模块,RDNet实现了对尺度变化的鲁棒性和精确的定位能力,与最先进的方法相比,提供了更优的检测性能。