Blood cell detection is a typical small-scale object detection problem in computer vision. In this paper, we propose a CST-YOLO model for blood cell detection based on YOLOv7 architecture and enhance it with the CNN-Swin Transformer (CST), which is a new attempt at CNN-Transformer fusion. We also introduce three other useful modules: Weighted Efficient Layer Aggregation Networks (W-ELAN), Multiscale Channel Split (MCS), and Concatenate Convolutional Layers (CatConv) in our CST-YOLO to improve small-scale object detection precision. Experimental results show that the proposed CST-YOLO achieves 92.7%, 95.6%, and 91.1% mAP@0.5, respectively, on three blood cell datasets, outperforming state-of-the-art object detectors, e.g., RT-DETR, YOLOv5, and YOLOv7. Our code is available at https://github.com/mkang315/CST-YOLO.
翻译:血细胞检测是计算机视觉中典型的小尺度目标检测问题。本文提出一种基于YOLOv7架构的CST-YOLO模型用于血细胞检测,并采用CNN-Swin Transformer(CST)进行增强,这是CNN-Transformer融合的一次新尝试。我们还在CST-YOLO中引入了另外三个有效模块:加权高效层聚合网络(W-ELAN)、多尺度通道分割(MCS)和级联卷积层(CatConv),以提升小尺度目标检测精度。实验结果表明,所提出的CST-YOLO在三个血细胞数据集上的mAP@0.5分别达到92.7%、95.6%和91.1%,性能优于RT-DETR、YOLOv5和YOLOv7等先进目标检测器。我们的代码公开于https://github.com/mkang315/CST-YOLO。