In farming systems, harvesting operations are tedious, time- and resource-consuming tasks. Based on this, deploying a fleet of autonomous robots to work alongside farmworkers may provide vast productivity and logistics benefits. Then, an intelligent robotic system should monitor human behavior, identify the ongoing activities and anticipate the worker's needs. In this work, the main contribution consists of creating a benchmark model for video-based human pickers detection, classifying their activities to serve in harvesting operations for different agricultural scenarios. Our solution uses the combination of a Mask Region-based Convolutional Neural Network (Mask R-CNN) for object detection and optical flow for motion estimation with newly added statistical attributes of flow motion descriptors, named as Correlation Sensitivity (CS). A classification criterion is defined based on the Kernel Density Estimation (KDE) analysis and K-means clustering algorithm, which are implemented upon in-house collected dataset from different crop fields like strawberry polytunnels and apple tree orchards. The proposed framework is quantitatively analyzed using sensitivity, specificity, and accuracy measures and shows satisfactory results amidst various dataset challenges such as lighting variation, blur, and occlusions.
翻译:在农业系统中,采摘作业是繁琐、耗时且耗费资源的任务。基于此,部署自主机器人团队与农场工人协同作业,可显著提升生产效率与物流效益。因此,智能机器人系统需监控人类行为、识别当前活动并预判工人需求。本研究的主要贡献在于构建了一个基于视频的采摘人员检测基准模型,通过分类其活动服务于不同农业场景的采摘作业。我们的方案结合了用于目标检测的掩膜区域卷积神经网络(Mask R-CNN)与用于运动估计的光流法,并新增了流运动描述子的统计属性——相关灵敏度(CS)。基于核密度估计(KDE)分析与K均值聚类算法定义了分类准则,该准则应用于自建的包含草莓拱棚、苹果园等多类作物田地的数据集。通过灵敏度、特异度与准确率指标对该框架进行量化分析,结果表明其在光照变化、模糊及遮挡等数据集挑战下仍能取得满意效果。