Curriculum learning has demonstrated substantial effectiveness in robot learning. However, it still faces limitations when scaling to complex, wide-ranging task spaces. Such task spaces often lack a well-defined difficulty structure, making the difficulty ordering required by previous methods challenging to define. We propose a Learning Progress-based Automatic Curriculum Reinforcement Learning (LP-ACRL) framework, which estimates the agent's learning progress online and adaptively adjusts the task-sampling distribution, thereby enabling automatic curriculum generation without prior knowledge of the difficulty distribution over the task space. Policies trained with LP-ACRL enable the ANYmal D quadruped to achieve and maintain stable, high-speed locomotion at 2.5 m/s linear velocity and 3.0 rad/s angular velocity across diverse terrains, including stairs, slopes, gravel, and low-friction flat surfaces--whereas previous methods have generally been limited to high speeds on flat terrain or low speeds on complex terrain. Experimental results demonstrate that LP-ACRL exhibits strong scalability and real-world applicability, providing a robust baseline for future research on curriculum generation in complex, wide-ranging robotic learning task spaces.
翻译:课程学习在机器人学习领域已展现出显著成效。然而,当扩展到复杂且广泛的任务空间时,该方法仍面临局限性。此类任务空间通常缺乏明确定义的难度结构,使得先前方法所需的难度排序难以界定。我们提出了一种基于学习进度的自动课程强化学习框架,该框架在线评估智能体的学习进度并自适应调整任务采样分布,从而能够在无需先验任务空间难度分布知识的情况下实现自动课程生成。通过LP-ACRL训练的策略使ANYmal D四足机器人能够在包括楼梯、斜坡、碎石和低摩擦平面在内的多样化地形上,以2.5米/秒的线速度和3.0弧度/秒的角速度实现并保持稳定高速运动——而先前方法通常仅限于在平坦地形实现高速运动或在复杂地形进行低速运动。实验结果表明,LP-ACRL展现出强大的可扩展性和现实适用性,为未来复杂广泛机器人学习任务空间中的课程生成研究提供了稳健的基准。