While one-stage detectors like YOLOv8 offer fast training speed, they often under-perform on detecting small objects as a trade-off. This becomes even more critical when detecting tiny objects in aerial imagery due to low-resolution targets and cluttered backgrounds. To address this, we introduce four enhancement strategies-input image resolution adjustment, data augmentation, attention mechanisms, and an alternative gating function for attention modules-that can be easily implemented on YOLOv8. We demonstrate that image size enlargement and the proper use of augmentation can lead to enhancement. Additionally, we designed a Mixture of Orthogonal Neural-modules Network (MoonNet) pipeline which consists of multiple attention-module-augmented CNNs. Two well-known attention modules, Squeeze-and-Excitation (SE) Block and Convolutional Block Attention Module (CBAM), were integrated into the backbone of YOLOv8 to form the MoonNet design, and the MoonNet backbone obtained improved detection accuracy compared to the original YOLOv8 backbone and single-type attention-module-augmented backbones. MoonNet further proved its adaptability and potential by achieving state-of-the-art performance on a tiny-object benchmark when integrated with the YOLC model. Our code is available at: https://github.com/Kihyun11/MoonNet
翻译:尽管YOLOv8等单阶段检测器具有训练速度快的优势,但其检测小目标时性能往往受限,这是速度与精度权衡的结果。在航空图像中检测微小目标时,由于目标分辨率低且背景杂乱,这一问题尤为突出。为此,我们提出了四种增强策略——输入图像分辨率调整、数据增强、注意力机制以及注意力模块的替代门控函数——这些策略可便捷地应用于YOLOv8框架。我们证明了增大图像尺寸与合理运用数据增强能有效提升检测性能。此外,我们设计了一种正交神经模块混合网络(MoonNet)架构,该架构由多个注意力模块增强的CNN组成。我们将两种经典注意力模块——挤压激励(SE)模块与卷积块注意力模块(CBAM)——集成到YOLOv8主干网络中,构建了MoonNet设计。实验表明,相较于原始YOLOv8主干网络及单一类型注意力模块增强的主干网络,MoonNet主干网络获得了更高的检测精度。当与YOLC模型结合时,MoonNet在微小目标基准测试中取得了最先进的性能,进一步证明了其适应性与潜力。我们的代码已开源:https://github.com/Kihyun11/MoonNet