This paper presents a generalized model for real-time detection of flying objects that can be used for transfer learning and further research, as well as a refined model that is ready for implementation. We achieve this by training our first generalized model on a data set containing 40 different classes of flying objects, forcing the model to extract abstract feature representations. We then perform transfer learning with these learned parameters on a data set more representative of real world environments (i.e., higher frequency of occlusion, small spatial sizes, rotations, etc.) to generate our refined model. Object detection of flying objects remains challenging due to large variance object spatial sizes/aspect ratios, rate of speed, occlusion, and clustered backgrounds. To address some of the presented challenges while simultaneously maximizing performance, we utilize the current state of the art single-shot detector, YOLOv8, in an attempt to find the best tradeoff between inference speed and mAP. While YOLOv8 is being regarded as the new state-of-the-art, an official paper has not been provided. Thus, we provide an in-depth explanation of the new architecture and functionality that YOLOv8 has adapted. Our final generalized model achieves an mAP50-95 of 0.685 and average inference speed on 1080p videos of 50 fps. Our final refined model maintains this inference speed and achieves an improved mAP50-95 of 0.835.
翻译:本文提出了一种可用于迁移学习及后续研究的通用型实时飞行目标检测模型,同时构建了一个可直接部署的精细化模型。我们首先在包含40类飞行目标的数据集上训练通用模型,迫使模型提取抽象特征表征;随后利用这些学习到的参数,在更贴近真实环境(即更高频率的遮挡、小空间尺寸、旋转等)的数据集上进行迁移学习,生成精细化模型。由于飞行目标在空间尺寸/长宽比、运动速度、遮挡情况及背景复杂度方面存在巨大差异,目标检测仍具挑战性。为应对部分上述挑战并最大化性能,我们采用当前最先进的单阶段检测器YOLOv8,以寻求推理速度与mAP的最佳平衡。尽管YOLOv8被视为新一代最先进技术,但尚未有正式论文公开其细节,故本文对YOLOv8采用的新型架构与功能进行了深入阐释。最终通用模型在1080p视频上实现了0.685的mAP50-95和50fps的平均推理速度;精细化模型在保持相同推理速度的同时,将mAP50-95提升至0.835。