In the past years, YOLO-series models have emerged as the leading approaches in the area of real-time object detection. Many studies pushed up the baseline to a higher level by modifying the architecture, augmenting data and designing new losses. However, we find previous models still suffer from information fusion problem, although Feature Pyramid Network (FPN) and Path Aggregation Network (PANet) have alleviated this. Therefore, this study provides an advanced Gatherand-Distribute mechanism (GD) mechanism, which is realized with convolution and self-attention operations. This new designed model named as Gold-YOLO, which boosts the multi-scale feature fusion capabilities and achieves an ideal balance between latency and accuracy across all model scales. Additionally, we implement MAE-style pretraining in the YOLO-series for the first time, allowing YOLOseries models could be to benefit from unsupervised pretraining. Gold-YOLO-N attains an outstanding 39.9% AP on the COCO val2017 datasets and 1030 FPS on a T4 GPU, which outperforms the previous SOTA model YOLOv6-3.0-N with similar FPS by +2.4%. The PyTorch code is available at https://github.com/huawei-noah/Efficient-Computing/tree/master/Detection/Gold-YOLO, and the MindSpore code is available at https://gitee.com/mindspore/models/tree/master/research/cv/Gold_YOLO.
翻译:近年来,YOLO系列模型已成为实时目标检测领域的领先方法。大量研究通过改进架构、增强数据和设计新损失函数,不断将基线提升至更高水平。然而,尽管特征金字塔网络(FPN)和路径聚合网络(PANet)已经缓解了信息融合问题,我们仍发现现有模型在此方面存在不足。为此,本研究提出了一种先进的收集与分发机制(GD机制),通过卷积和自注意力操作实现。这一新设计模型被命名为Gold-YOLO,它增强了多尺度特征融合能力,并在所有模型规模上实现了延迟与精度之间的理想平衡。此外,我们首次在YOLO系列中实现了MAE风格的预训练,使YOLO系列模型能够受益于无监督预训练。Gold-YOLO-N在COCO val2017数据集上达到了39.9%的卓越AP,并在T4 GPU上实现了1030 FPS,以相似帧率超越先前SOTA模型YOLOv6-3.0-N达+2.4%。PyTorch代码已开源至https://github.com/huawei-noah/Efficient-Computing/tree/master/Detection/Gold-YOLO,MindSpore代码已开源至https://gitee.com/mindspore/models/tree/master/research/cv/Gold_YOLO。