Robotic Strawberry Harvesting with Robust Vision and Deep Reinforcement Learning based Sim-to-Real Control

This study presents a closed-loop robotic strawberry harvesting system that combines a robust vision module, simulation-trained deep reinforcement learning (DRL) control, and ROS-based realrobot execution. For perception, we propose HRAttnEdge-YOLO26-seg, a modified YOLO26-seg architecture that incorporates a high-resolution P2 branch, segmentation-path attention, and edgesupervised prototype learning to improve instance segmentation in cluttered scenes. For control, we train a target-conditioned Proximal Policy Optimization (PPO) policy in Isaac Lab to produce smooth joint-position commands for a UR10e manipulator and deploy it on a UR10e robot for targetfruit reaching and harvesting. This simulation-based approach reduces hardware dependency, lowers development cost, and allows scalable policy training without exhaustive physical trials before real deployment. The proposed vision model demonstrated the highest overall performance among the evaluated methods. On both self-collected and public datasets, the model showed a 10 to 14% improvement in segmentation performance. In controlled in-house tests, the PPO controller produced stable and dynamically smoother motion than a inverse kinematics (IK)-based MoveIt baseline. In greenhouse trials, the proposed integrated system harvested 281 strawberries, achieving 96.6% reaching success, 91.3% grasp-and-pull success, and 84.3% overall harvesting success. These results illustrate that task-specific perception combined with simulation-trained PPO can serve as a practical and resource-efficient alternative to conventional planner-dependent reaching in manipulation, enabling reliable closed-loop robotic harvesting in complex agricultural environments.

翻译：本研究提出了一种闭环机器人草莓采摘系统，该系统集成了鲁棒视觉模块、基于仿真训练的深度强化学习（DRL）控制以及基于ROS的真实机器人执行。在感知方面，我们提出了HRAttnEdge-YOLO26-seg，一种改进的YOLO26-seg架构，其通过引入高分辨率P2分支、分割路径注意力以及边缘监督原型学习，提升了杂乱场景中的实例分割性能。在控制方面，我们在Isaac Lab中训练了一种目标条件化的近端策略优化（PPO）策略，用于生成UR10e机械臂的平滑关节位置指令，并将其部署于UR10e机器人上执行目标果实到达与采摘任务。这种基于仿真的方法降低了硬件依赖性和开发成本，使得在实际部署前无需进行大量物理试验即可实现可扩展的策略训练。所提出的视觉模型在所有评估方法中展现出最佳整体性能。在自采集数据集和公开数据集上，该模型的分割性能提升了10%至14%。在受控室内测试中，PPO控制器产生的运动比基于逆运动学（IK）的MoveIt基线更稳定且动态更平滑。在温室试验中，所提出的集成系统成功采摘了281颗草莓，实现了96.6%的到达成功率、91.3%的抓取-拉取成功率以及84.3%的整体采摘成功率。这些结果表明，任务特定感知与仿真训练的PPO相结合，可作为操作中传统依赖规划器的到达方法的实用且资源高效的替代方案，从而在复杂农业环境中实现可靠的闭环机器人采摘。