Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism

In the past years, YOLO-series models have emerged as the leading approaches in the area of real-time object detection. Many studies pushed up the baseline to a higher level by modifying the architecture, augmenting data and designing new losses. However, we find previous models still suffer from information fusion problem, although Feature Pyramid Network (FPN) and Path Aggregation Network (PANet) have alleviated this. Therefore, this study provides an advanced Gatherand-Distribute mechanism (GD) mechanism, which is realized with convolution and self-attention operations. This new designed model named as Gold-YOLO, which boosts the multi-scale feature fusion capabilities and achieves an ideal balance between latency and accuracy across all model scales. Additionally, we implement MAE-style pretraining in the YOLO-series for the first time, allowing YOLOseries models could be to benefit from unsupervised pretraining. Gold-YOLO-N attains an outstanding 39.9% AP on the COCO val2017 datasets and 1030 FPS on a T4 GPU, which outperforms the previous SOTA model YOLOv6-3.0-N with similar FPS by +2.4%. The PyTorch code is available at https://github.com/huawei-noah/Efficient-Computing/tree/master/Detection/Gold-YOLO, and the MindSpore code is available at https://gitee.com/mindspore/models/tree/master/research/cv/Gold_YOLO.

翻译：近年来，YOLO系列模型已成为实时目标检测领域的领先方法。大量研究通过改进架构、增强数据和设计新损失函数，不断将基线提升至更高水平。然而，尽管特征金字塔网络（FPN）和路径聚合网络（PANet）已经缓解了信息融合问题，我们仍发现现有模型在此方面存在不足。为此，本研究提出了一种先进的收集与分发机制（GD机制），通过卷积和自注意力操作实现。这一新设计模型被命名为Gold-YOLO，它增强了多尺度特征融合能力，并在所有模型规模上实现了延迟与精度之间的理想平衡。此外，我们首次在YOLO系列中实现了MAE风格的预训练，使YOLO系列模型能够受益于无监督预训练。Gold-YOLO-N在COCO val2017数据集上达到了39.9%的卓越AP，并在T4 GPU上实现了1030 FPS，以相似帧率超越先前SOTA模型YOLOv6-3.0-N达+2.4%。PyTorch代码已开源至https://github.com/huawei-noah/Efficient-Computing/tree/master/Detection/Gold-YOLO，MindSpore代码已开源至https://gitee.com/mindspore/models/tree/master/research/cv/Gold_YOLO。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日