Real-time Strawberry Detection Based on Improved YOLOv5s Architecture for Robotic Harvesting in open-field environment

This study proposed a YOLOv5-based custom object detection model to detect strawberries in an outdoor environment. The original architecture of the YOLOv5s was modified by replacing the C3 module with the C2f module in the backbone network, which provided a better feature gradient flow. Secondly, the Spatial Pyramid Pooling Fast in the final layer of the backbone network of YOLOv5s was combined with Cross Stage Partial Net to improve the generalization ability over the strawberry dataset in this study. The proposed architecture was named YOLOv5s-Straw. The RGB images dataset of the strawberry canopy with three maturity classes (immature, nearly mature, and mature) was collected in open-field environment and augmented through a series of operations including brightness reduction, brightness increase, and noise adding. To verify the superiority of the proposed method for strawberry detection in open-field environment, four competitive detection models (YOLOv3-tiny, YOLOv5s, YOLOv5s-C2f, and YOLOv8s) were trained, and tested under the same computational environment and compared with YOLOv5s-Straw. The results showed that the highest mean average precision of 80.3% was achieved using the proposed architecture whereas the same was achieved with YOLOv3-tiny, YOLOv5s, YOLOv5s-C2f, and YOLOv8s were 73.4%, 77.8%, 79.8%, 79.3%, respectively. Specifically, the average precision of YOLOv5s-Straw was 82.1% in the immature class, 73.5% in the nearly mature class, and 86.6% in the mature class, which were 2.3% and 3.7%, respectively, higher than that of the latest YOLOv8s. The model included 8.6*10^6 network parameters with an inference speed of 18ms per image while the inference speed of YOLOv8s had a slower inference speed of 21.0ms and heavy parameters of 11.1*10^6, which indicates that the proposed model is fast enough for real time strawberry detection and localization for the robotic picking.

翻译：本研究提出一种基于YOLOv5的自定义目标检测模型，用于室外环境中的草莓检测。通过将骨干网络中的C3模块替换为C2f模块，改进了YOLOv5s的原始架构，从而获得更优的特征梯度流。其次，将YOLOv5s骨干网络末层的空间金字塔池化快速模块与跨阶段局部网络相结合，提升了模型在本研究草莓数据集上的泛化能力。该架构被命名为YOLOv5s-Straw。本研究在开放农田环境中采集了包含三种成熟度等级（未成熟、近成熟和成熟）的草莓冠层RGB图像数据集，并通过降低亮度、增加亮度和添加噪声等一系列操作进行数据增强。为验证所提方法在开放农田环境下草莓检测的优越性，在相同计算环境下训练并测试了四种竞争性检测模型（YOLOv3-tiny、YOLOv5s、YOLOv5s-C2f和YOLOv8s），并与YOLOv5s-Straw进行对比。结果表明，采用所提架构实现了80.3%的最高平均精度均值，而YOLOv3-tiny、YOLOv5s、YOLOv5s-C2f和YOLOv8s的对应值分别为73.4%、77.8%、79.8%和79.3%。具体而言，YOLOv5s-Straw在未成熟、近成熟和成熟类别上的平均精度分别为82.1%、73.5%和86.6%，比最新YOLOv8s分别高出2.3%和3.7%。该模型包含8.6×10^6个网络参数，每张图像推理速度为18ms，而YOLOv8s的推理速度较慢（21.0ms）且参数较重（11.1×10^6），表明所提模型足以满足机器人采摘场景下草莓实时检测与定位的需求。