In the past years, YOLO-series models have emerged as the leading approaches in the area of real-time object detection. Many studies pushed up the baseline to a higher level by modifying the architecture, augmenting data and designing new losses. However, we find previous models still suffer from information fusion problem, although Feature Pyramid Network (FPN) and Path Aggregation Network (PANet) have alleviated this. Therefore, this study provides an advanced Gatherand-Distribute mechanism (GD) mechanism, which is realized with convolution and self-attention operations. This new designed model named as Gold-YOLO, which boosts the multi-scale feature fusion capabilities and achieves an ideal balance between latency and accuracy across all model scales. Additionally, we implement MAE-style pretraining in the YOLO-series for the first time, allowing YOLOseries models could be to benefit from unsupervised pretraining. Gold-YOLO-N attains an outstanding 39.9% AP on the COCO val2017 datasets and 1030 FPS on a T4 GPU, which outperforms the previous SOTA model YOLOv6-3.0-N with similar FPS by +2.4%. The PyTorch code is available at https://github.com/huaweinoah/Efficient-Computing/Detection/Gold-YOLO, and the MindSpore code is available at https://gitee.com/mindspore/models/tree/master/research/cv/Gold_YOLO.
翻译:近年来,YOLO系列模型已成为实时目标检测领域的领先方法。诸多研究通过改进架构、增强数据以及设计新损失函数,将基线提升至更高水平。然而,我们发现尽管特征金字塔网络(FPN)和路径聚合网络(PANet)已在一定程度上缓解了信息融合问题,但现有模型仍存在该不足。为此,本研究提出了一种先进的聚合-分发机制(GD机制),该机制通过卷积与自注意力操作实现。基于此设计的新型模型命名为Gold-YOLO,它增强了多尺度特征融合能力,并在所有模型尺度上实现了延迟与精度的理想平衡。此外,我们首次在YOLO系列中引入MAE风格的预训练方法,使YOLO系列模型能够受益于无监督预训练。Gold-YOLO-N在COCO val2017数据集上取得了39.9% AP的卓越性能,并在T4 GPU上达到1030 FPS,与具有相似FPS的先前最优模型YOLOv6-3.0-N相比,精度提升了2.4%。PyTorch代码已开源在https://github.com/huaweinoah/Efficient-Computing/Detection/Gold-YOLO,MindSpore代码已开源在https://gitee.com/mindspore/models/tree/master/research/cv/Gold_YOLO。