Object detection in remote sensing images (RSIs) often suffers from several increasing challenges, including the large variation in object scales and the diverse-ranging context. Prior methods tried to address these challenges by expanding the spatial receptive field of the backbone, either through large-kernel convolution or dilated convolution. However, the former typically introduces considerable background noise, while the latter risks generating overly sparse feature representations. In this paper, we introduce the Poly Kernel Inception Network (PKINet) to handle the above challenges. PKINet employs multi-scale convolution kernels without dilation to extract object features of varying scales and capture local context. In addition, a Context Anchor Attention (CAA) module is introduced in parallel to capture long-range contextual information. These two components work jointly to advance the performance of PKINet on four challenging remote sensing detection benchmarks, namely DOTA-v1.0, DOTA-v1.5, HRSC2016, and DIOR-R.
翻译:遥感图像中的目标检测常面临多个日益严峻的挑战,包括目标尺度的大幅变化和多样化的上下文背景。现有方法试图通过扩展主干网络的空间感受野来应对这些挑战,具体方式包括大核卷积或膨胀卷积。然而,前者通常引入大量背景噪声,而后者则可能产生过于稀疏的特征表示。本文提出多核Inception网络(PKINet)以解决上述挑战。PKINet采用无膨胀的多尺度卷积核提取不同尺度的目标特征并捕获局部上下文信息。此外,本文并行引入上下文锚定注意力(CAA)模块以获取长程上下文信息。这两个组件共同提升了PKINet在四个具有挑战性的遥感检测基准(即DOTA-v1.0、DOTA-v1.5、HRSC2016和DIOR-R)上的性能。