In earthwork and construction, excavators often encounter large rocks mixed with various soil conditions, requiring skilled operators. This paper presents a framework for achieving autonomous excavation using reinforcement learning (RL) through a rock excavation simulator. In the simulation, resolution can be defined by the particle size/number in the whole soil space. Fine-resolution simulations closely mimic real-world behavior but demand significant calculation time and challenging sample collection, while coarse-resolution simulations enable faster sample collection but deviate from real-world behavior. To combine the advantages of both resolutions, we explore using policies developed in coarse-resolution simulations for pre-training in fine-resolution simulations. To this end, we propose a novel policy learning framework called Progressive-Resolution Policy Distillation (PRPD), which progressively transfers policies through some middle-resolution simulations with conservative policy transfer to avoid domain gaps that could lead to policy transfer failure. Validation in a rock excavation simulator and nine real-world rock environments demonstrated that PRPD reduced sampling time to less than 1/7 while maintaining task success rates comparable to those achieved through policy learning in a fine-resolution simulation. Additional videos and supplementary results are available on our project page: https://yuki-kadokawa.github.io/prpd/
翻译:在土方工程与建筑施工中,挖掘机常需处理混杂于不同土壤条件中的大型岩石,这对操作人员技能提出较高要求。本文提出一种通过岩石挖掘仿真器、利用强化学习实现自主挖掘的框架。在仿真环境中,分辨率可通过整个土壤空间中颗粒尺寸/数量进行定义。细分辨率仿真能高度模拟真实世界行为,但需要大量计算时间且样本采集困难;而粗分辨率仿真虽能加速样本采集,其行为却与真实世界存在偏差。为结合两种分辨率的优势,本研究探索利用粗分辨率仿真中开发的策略在细分辨率仿真中进行预训练。为此,我们提出名为渐进分辨率策略蒸馏的新型策略学习框架,该框架通过若干中间分辨率仿真逐步迁移策略,并采用保守策略迁移方法以避免因领域差异导致的策略迁移失败。在岩石挖掘仿真器及九种真实岩石环境中的验证表明,PRPD在保持与细分辨率仿真策略学习相当任务成功率的同时,将采样时间减少至原时间的1/7以下。更多视频及补充结果详见项目页面:https://yuki-kadokawa.github.io/prpd/