3D Operation of Autonomous Excavator based on Reinforcement Learning through Independent Reward for Individual Joints

In this paper, we propose a control algorithm based on reinforcement learning, employing independent rewards for each joint to control excavators in a 3D space. The aim of this research is to address the challenges associated with achieving precise control of excavators, which are extensively utilized in construction sites but prove challenging to control with precision due to their hydraulic structures. Traditional methods relied on operator expertise for precise excavator operation, occasionally resulting in safety accidents. Therefore, there have been endeavors to attain precise excavator control through equation-based control algorithms. However, these methods had the limitation of necessitating prior information related to physical values of the excavator, rendering them unsuitable for the diverse range of excavators used in the field. To overcome these limitations, we have explored reinforcement learning-based control methods that do not demand prior knowledge of specific equipment but instead utilize data to train models. Nevertheless, existing reinforcement learning-based methods overlooked cabin swing rotation and confined the bucket's workspace to a 2D plane. Control confined within such a limited area diminishes the applicability of the algorithm in construction sites. We address this issue by expanding the previous 2D plane workspace of the bucket operation into a 3D space, incorporating cabin swing rotation. By expanding the workspace into 3D, excavators can execute continuous operations without requiring human intervention. To accomplish this objective, distinct targets were established for each joint, facilitating the training of action values for each joint independently, regardless of the progress of other joint learning.

翻译：本文提出一种基于强化学习的控制算法，通过为每个关节设置独立奖励来实现挖掘机在三维空间中的控制。本研究旨在解决挖掘机精确控制所面临的挑战——挖掘机虽广泛应用于施工现场，但由于其液压结构特性，实现精确控制较为困难。传统方法依赖操作员的专业技能来实现精确操控，但偶发安全事故。因此，已有研究尝试通过基于方程的控制算法实现精确的挖掘机控制。然而，这类方法存在局限性：需要预先获取挖掘机相关物理参数的先验信息，导致其难以适用于现场种类繁多的挖掘机设备。为克服这些局限，我们探索了基于强化学习的控制方法，该方法无需特定设备的先验知识，而是利用数据训练模型。然而，现有基于强化学习的方法忽略了驾驶室回转运动，并将铲斗工作空间限制在二维平面内。在这种受限区域内的控制降低了算法在施工现场的适用性。我们通过将先前铲斗操作的二维平面工作空间扩展至包含驾驶室回转运动的三维空间来解决此问题。通过将工作空间扩展至三维，挖掘机可在无需人工干预的情况下执行连续作业。为实现此目标，我们为每个关节设立独立目标，使得各关节的动作价值能够独立训练，而不受其他关节学习进度的影响。