Although convolutional neural networks have made outstanding achievements in visible light target detection, there are still many challenges in infrared small object detection because of the low signal-to-noise ratio, incomplete object structure, and a lack of reliable infrared small object dataset. To resolve limitations of the infrared small object dataset, a new dataset named InfraTiny was constructed, and more than 85% bounding box is less than 32x32 pixels (3218 images and a total of 20,893 bounding boxes). A multi-scale attention mechanism module (MSAM) and a Feature Fusion Augmentation Pyramid Module (FFAFPM) were proposed and deployed onto embedded devices. The MSAM enables the network to obtain scale perception information by acquiring different receptive fields, while the background noise information is suppressed to enhance feature extraction ability. The proposed FFAFPM can enrich semantic information, and enhance the fusion of shallow feature and deep feature, thus false positive results have been significantly reduced. By integrating the proposed methods into the YOLO model, which is named Infra-YOLO, infrared small object detection performance has been improved. Compared to yolov3, [email protected] has been improved by 2.7%; and compared to yolov4, that by 2.5% on the InfraTiny dataset. The proposed Infra-YOLO was also transferred onto the embedded device in the unmanned aerial vehicle (UAV) for real application scenarios, where the channel pruning method is adopted to reduce FLOPs and to achieve a tradeoff between speed and accuracy. Even if the parameters of Infra-YOLO are reduced by 88% with the pruning method, a gain of 0.7% is still achieved on [email protected] compared to yolov3, and a gain of 0.5% compared to yolov4. Experimental results show that the proposed MSAM and FFAFPM method can improve infrared small object detection performance compared with the previous benchmark method.
翻译:尽管卷积神经网络在可见光目标检测领域取得了显著成就,但由于信噪比低、目标结构不完整以及缺乏可靠的红外小目标数据集,红外小目标检测仍面临诸多挑战。为克服红外小目标数据集的局限性,本研究构建了一个名为InfraTiny的新数据集,其中超过85%的标注框尺寸小于32×32像素(共3218张图像,总计20,893个标注框)。本文提出了一种多尺度注意力机制模块(MSAM)与特征融合增强金字塔模块(FFAFPM),并将其部署于嵌入式设备。MSAM使网络能够通过获取不同感受野来获得尺度感知信息,同时抑制背景噪声信息以增强特征提取能力。所提出的FFAFPM能够丰富语义信息,并加强浅层特征与深层特征的融合,从而显著降低误检率。通过将所提方法集成至YOLO模型(命名为Infra-YOLO),红外小目标检测性能得到提升。在InfraTiny数据集上,相较于yolov3,[email protected]提升了2.7%;相较于yolov4,提升了2.5%。所提出的Infra-YOLO还被移植到无人机嵌入式设备中以适应实际应用场景,其中采用通道剪枝方法以减少FLOPs并实现速度与精度的平衡。即使通过剪枝方法将Infra-YOLO参数量减少88%,其[email protected]仍较yolov3提升0.7%,较yolov4提升0.5%。实验结果表明,与现有基准方法相比,所提出的MSAM与FFAFPM方法能够有效提升红外小目标检测性能。