The proposed YOLO-Former method seamlessly integrates the ideas of transformer and YOLOv4 to create a highly accurate and efficient object detection system. The method leverages the fast inference speed of YOLOv4 and incorporates the advantages of the transformer architecture through the integration of convolutional attention and transformer modules. The results demonstrate the effectiveness of the proposed approach, with a mean average precision (mAP) of 85.76\% on the Pascal VOC dataset, while maintaining high prediction speed with a frame rate of 10.85 frames per second. The contribution of this work lies in the demonstration of how the innovative combination of these two state-of-the-art techniques can lead to further improvements in the field of object detection.
翻译:本文提出的YOLO-Former方法巧妙融合了Transformer与YOLOv4的核心思想,构建出高精度、高效率的目标检测系统。该方法在保持YOLOv4快速推理速度的基础上,通过集成卷积注意力机制与Transformer模块,充分发挥了Transformer架构的优势。实验结果表明,所提方法在Pascal VOC数据集上达到85.76%的平均精度均值(mAP),同时维持每秒10.85帧的高预测速度。本研究的贡献在于证明了这两种先进技术的创新性组合可推动目标检测领域的进一步发展。