This paper presents a data-driven methodology for the control of static hydraulic impact hammers, also known as rock breakers, which are commonly used in the mining industry. The task addressed in this work is that of controlling the rock-breaker so its end-effector reaches arbitrary target poses, which is required in normal operation to place the hammer on top of rocks that need to be fractured. The proposed approach considers several constraints, such as unobserved state variables due to limited sensing and the strict requirement of using a discrete control interface at the joint level. First, the proposed methodology addresses the problem of system identification to obtain an approximate dynamic model of the hydraulic arm. This is done via supervised learning, using only teleoperation data. The learned dynamic model is then exploited to obtain a controller capable of reaching target end-effector poses. For policy synthesis, both reinforcement learning (RL) and model predictive control (MPC) algorithms are utilized and contrasted. As a case study, we consider the automation of a Bobcat E10 mini-excavator arm with a hydraulic impact hammer attached as end-effector. Using this machine, both the system identification and policy synthesis stages are studied in simulation and in the real world. The best RL-based policy consistently reaches target end-effector poses with position errors below 12 cm and pitch angle errors below 0.08 rad in the real world. Considering that the impact hammer has a 4 cm diameter chisel, this level of precision is sufficient for breaking rocks. Notably, this is accomplished by relying only on approximately 68 min of teleoperation data to train and 8 min to evaluate the dynamic model, and without performing any adjustments for a successful policy Sim2Real transfer. A demonstration of policy execution in the real world can be found in https://youtu.be/e-7tDhZ4ZgA.
翻译:本文提出了一种用于静态液压冲击锤(亦称岩石破碎机,在采矿行业中广泛应用)控制的数据驱动方法。本研究旨在控制岩石破碎机,使其末端执行器能够到达任意目标位姿,这在常规操作中是将锤头定位到待破碎岩石顶部所必需的。所提出的方法考虑了多种约束条件,例如因传感能力有限导致的未观测状态变量,以及在关节层面必须使用离散控制接口的严格要求。首先,该方法通过系统辨识来获取液压臂的近似动力学模型,这一过程仅利用遥操作数据,通过监督学习完成。随后,利用学习到的动力学模型来构建能够到达目标末端执行器位姿的控制器。在策略合成阶段,我们采用并对比了强化学习(RL)与模型预测控制(MPC)算法。作为案例研究,我们考虑将配备液压冲击锤作为末端执行器的Bobcat E10微型挖掘机臂实现自动化。利用该设备,我们在仿真和真实环境中对系统辨识和策略合成阶段进行了研究。在真实环境中,基于RL的最佳策略能够持续到达目标末端执行器位姿,其位置误差低于12厘米,俯仰角误差低于0.08弧度。考虑到冲击锤的凿子直径为4厘米,此精度水平足以完成岩石破碎任务。值得注意的是,这一成果仅依赖于约68分钟的遥操作数据进行训练和8分钟的数据进行动力学模型评估,且未进行任何调整即成功实现了策略从仿真到现实的迁移。策略在真实世界中的执行演示可见于 https://youtu.be/e-7tDhZ4ZgA。