We propose a novel Attentional Scale Sequence Fusion based You Only Look Once (YOLO) framework (ASF-YOLO) which combines spatial and scale features for accurate and fast cell instance segmentation. Built on the YOLO segmentation framework, we employ the Scale Sequence Feature Fusion (SSFF) module to enhance the multi-scale information extraction capability of the network, and the Triple Feature Encoder (TPE) module to fuse feature maps of different scales to increase detailed information. We further introduce a Channel and Position Attention Mechanism (CPAM) to integrate both the SSFF and TPE modules, which focus on informative channels and spatial position-related small objects for improved detection and segmentation performance. Experimental validations on two cell datasets show remarkable segmentation accuracy and speed of the proposed ASF-YOLO model. It achieves a box mAP of 0.91, mask mAP of 0.887, and an inference speed of 47.3 FPS on the 2018 Data Science Bowl dataset, outperforming the state-of-the-art methods. The source code is available at https://github.com/mkang315/ASF-YOLO.
翻译:我们提出了一种新颖的基于注意力尺度序列融合的YOLO(ASF-YOLO)框架,该框架结合了空间特征与尺度特征,用于实现准确且快速的细胞实例分割。该框架基于YOLO分割架构,采用尺度序列特征融合(SSFF)模块增强网络的多尺度信息提取能力,并利用三重特征编码器(TPE)模块融合不同尺度的特征图以增加细节信息。我们进一步引入通道与位置注意力机制(CPAM)来整合SSFF与TPE模块,该机制聚焦于信息丰富的通道及与空间位置相关的小目标,从而提升检测与分割性能。在两个细胞数据集上的实验验证表明,所提出的ASF-YOLO模型具有显著的分割精度与速度。在2018年数据科学碗数据集上,该模型实现了框平均精度(box mAP)为0.91、掩膜平均精度(mask mAP)为0.887以及推理速度为47.3帧/秒,性能优于现有最先进方法。源代码已发布在https://github.com/mkang315/ASF-YOLO。