Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism

In the past years, YOLO-series models have emerged as the leading approaches in the area of real-time object detection. Many studies pushed up the baseline to a higher level by modifying the architecture, augmenting data and designing new losses. However, we find previous models still suffer from information fusion problem, although Feature Pyramid Network (FPN) and Path Aggregation Network (PANet) have alleviated this. Therefore, this study provides an advanced Gatherand-Distribute mechanism (GD) mechanism, which is realized with convolution and self-attention operations. This new designed model named as Gold-YOLO, which boosts the multi-scale feature fusion capabilities and achieves an ideal balance between latency and accuracy across all model scales. Additionally, we implement MAE-style pretraining in the YOLO-series for the first time, allowing YOLOseries models could be to benefit from unsupervised pretraining. Gold-YOLO-N attains an outstanding 39.9% AP on the COCO val2017 datasets and 1030 FPS on a T4 GPU, which outperforms the previous SOTA model YOLOv6-3.0-N with similar FPS by +2.4%. The PyTorch code is available at https://github.com/huaweinoah/Efficient-Computing/Detection/Gold-YOLO, and the MindSpore code is available at https://gitee.com/mindspore/models/tree/master/research/cv/Gold_YOLO.

翻译：近年来，YOLO系列模型已成为实时目标检测领域的领先方法。诸多研究通过改进架构、增强数据以及设计新损失函数，将基线提升至更高水平。然而，我们发现尽管特征金字塔网络（FPN）和路径聚合网络（PANet）已在一定程度上缓解了信息融合问题，但现有模型仍存在该不足。为此，本研究提出了一种先进的聚合-分发机制（GD机制），该机制通过卷积与自注意力操作实现。基于此设计的新型模型命名为Gold-YOLO，它增强了多尺度特征融合能力，并在所有模型尺度上实现了延迟与精度的理想平衡。此外，我们首次在YOLO系列中引入MAE风格的预训练方法，使YOLO系列模型能够受益于无监督预训练。Gold-YOLO-N在COCO val2017数据集上取得了39.9% AP的卓越性能，并在T4 GPU上达到1030 FPS，与具有相似FPS的先前最优模型YOLOv6-3.0-N相比，精度提升了2.4%。PyTorch代码已开源在https://github.com/huaweinoah/Efficient-Computing/Detection/Gold-YOLO，MindSpore代码已开源在https://gitee.com/mindspore/models/tree/master/research/cv/Gold_YOLO。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日