Reinforcement Learning (RL) has the potential to enable extreme off-road mobility by circumventing complex kinodynamic modeling, planning, and control by simulated end-to-end trial-and-error learning experiences. However, most RL methods are sample-inefficient when training in a large amount of manually designed simulation environments and struggle at generalizing to the real world. To address these issues, we introduce Verti-Selector (VS), an automatic curriculum learning framework designed to enhance learning efficiency and generalization by selectively sampling training terrain. VS prioritizes vertically challenging terrain with higher Temporal Difference (TD) errors when revisited, thereby allowing robots to learn at the edge of their evolving capabilities. By dynamically adjusting the sampling focus, VS significantly boosts sample efficiency and generalization within the VW-Chrono simulator built on the Chrono multi-physics engine. Furthermore, we provide simulation and physical results using VS on a Verti-4-Wheeler platform. These results demonstrate that VS can achieve 23.08% improvement in terms of success rate by efficiently sampling during training and robustly generalizing to the real world.
翻译:强化学习(RL)有潜力通过模拟端到端试错学习经验,规避复杂的运动学建模、规划与控制,从而实现极端的越野移动能力。然而,大多数强化学习方法在大量人工设计的仿真环境中训练时样本效率低下,且难以泛化到现实世界。为解决这些问题,我们提出了Verti-Selector(VS),一种旨在通过选择性采样训练地形来提升学习效率与泛化能力的自动课程学习框架。VS优先选择在重新访问时具有较高时序差分(TD)误差的垂直挑战性地形,从而使机器人能够在自身不断演进的能力边界上进行学习。通过动态调整采样重点,VS在基于Chrono多物理引擎构建的VW-Chrono仿真器中显著提升了样本效率与泛化性能。此外,我们在Verti-4-Wheeler平台上使用VS提供了仿真与物理实验结果。这些结果表明,VS通过训练过程中的高效采样以及对现实世界的鲁棒泛化,能够在成功率方面实现23.08%的提升。