To automate harvesting and de-leafing of tomato plants using robots, it is important to search and detect the relevant plant parts, namely tomatoes, peduncles, and petioles. This is challenging due to high levels of occlusion in tomato greenhouses. Active vision is a promising approach which helps robots to deliberately plan camera viewpoints to overcome occlusion and improve perception accuracy. However, current active-vision algorithms cannot differentiate between relevant and irrelevant plant parts, making them inefficient for targeted perception of specific plant parts. We propose a semantic active-vision strategy that uses semantic information to identify the relevant plant parts and prioritises them during view planning using an attention mechanism. We evaluated our strategy using 3D models of tomato plants with varying structural complexity, which closely represented occlusions in the real world. We used a simulated environment to gain insights into our strategy, while ensuring repeatability and statistical significance. At the end of ten viewpoints, our strategy was able to correctly detect 85.5% of the plant parts, about 4 parts more on average per plant compared to a volumetric active-vision strategy. Also, it detected 5 and 9 parts more compared to two predefined strategies and 11 parts more compared to a random strategy. It also performed reliably with a median of 88.9% correctly-detected objects per plant in 96 experiments. Our strategy was also robust to uncertainty in plant and plant-part position, plant complexity, and different viewpoint sampling strategies. We believe that our work could significantly improve the speed and robustness of automated harvesting and de-leafing in tomato crop production.
翻译:为了实现番茄植株的自动化采摘与去叶作业,机器人需要搜索并检测相关植物部位,包括番茄果实、果柄和叶柄。由于番茄温室中严重的遮挡问题,这一任务具有挑战性。主动视觉是一种有前景的方法,它帮助机器人有策略地规划相机视角以克服遮挡并提升感知精度。然而,当前的主动视觉算法无法区分相关与无关的植物部位,导致它们在针对特定部位的感知任务中效率低下。我们提出了一种语义主动视觉策略,该策略利用语义信息识别相关植物部位,并通过注意力机制在视角规划中赋予这些部位优先权。我们使用结构复杂度各异的番茄植株三维模型评估该策略,这些模型能逼真模拟现实世界中的遮挡情况。同时,我们利用仿真环境深入分析策略表现,确保实验的可重复性与统计显著性。在规划十个视角后,我们的策略成功检测出85.5%的植物部位,平均每株比基于体积的主动视觉策略多检测约4个部位。此外,相比两种预定义策略多检测5个和9个部位,相比随机策略多检测11个部位。该策略在96次实验中表现稳定,每株正确检测部位的比率中位数达88.9%。同时,该策略对植株及部位位置的不确定性、植株复杂度以及不同视角采样策略均具有鲁棒性。我们相信,该研究成果可显著提升番茄作物生产中自动化采摘与去叶作业的速度与鲁棒性。