Autonomous Obstacle Removal for Excavators through Policy Learning with Particle Simulation

Autonomous obstacle removal from the ground is an important earthwork task, but this is difficult to automate because an excavator must adapt its excavation trajectories over repeated cycles as soil-obstacle conditions change. Learning such state-dependent behavior requires a training environment that reproduces accumulated soil-obstacle interactions, including contact states, terrain deformation, and obstacle visibility. Accordingly, particle-based simulation is suitable for the relevant policy learning. However, particle simulation is computationally expensive, and repeated excavation cycles further increase the learning cost. We observe that the burial condition of an obstacle governs both task difficulty and simulation cost: deeper burial makes obstacle removal harder while also requiring more particles for accurate simulation. This observation motivates a burial-conditioned curriculum learning strategy. We propose a time-efficient sim-to-real policy learning framework in which the policy observes terrain and obstacle information from RGB-D measurements and then outputs a parameterized excavation trajectory; in this process, the simulator reproduces in a real-world excavator the same observation-action interface it uses under controllable burial conditions. The curriculum begins with shallow burial conditions and progressively increases burial depth while adjusting particle count, thus simultaneously controlling task difficulty and simulation cost. Experiments show that the proposed framework successfully learns an effective obstacle-removal policy, whereas baseline methods fail even after a full week of training. The proposed curriculum achieves effective performance within three days and achieves successful transfer to a real 12-ton excavator operating on open ground with various steel obstacles, thus demonstrating robust obstacle removal.

翻译：自主清除地面障碍物是一项重要的土方工程任务，但由于挖掘机需在反复作业周期中根据土壤-障碍物条件变化调整其挖掘轨迹，该任务难以实现自动化。学习此类状态依赖行为需要能够复现累积的土壤-障碍物相互作用（包括接触状态、地形变形及障碍物可见性）的训练环境。因此，基于粒子的模拟适用于相关策略学习。然而，粒子模拟计算成本高昂，且反复挖掘周期进一步增加了学习成本。我们观察到障碍物的埋藏条件同时决定了任务难度与模拟成本：埋藏越深，障碍物清除越困难，同时需要更多粒子实现精确模拟。这一发现启发了基于埋藏条件的课程学习策略。我们提出一种时间高效的仿真到现实策略学习框架：策略通过RGB-D测量获取地形与障碍物信息，进而输出参数化的挖掘轨迹；在此过程中，模拟器在可控埋藏条件下为真实挖掘机复现了与之相同的观测-动作接口。课程从浅埋条件开始，逐步增加埋藏深度并调整粒子数量，从而同步控制任务难度与模拟成本。实验表明，所提框架成功习得了有效的障碍物清除策略，而基线方法即便经过一整周训练仍无法成功。所提课程策略在三天内即达到有效性能，并成功迁移至在开阔地面上处理多种钢制障碍物的真实12吨挖掘机，展现出鲁棒的障碍物清除能力。