Data-driven learning based methods have recently been particularly successful at learning robust locomotion controllers for a variety of unstructured terrains. Prior work has shown that incorporating good locomotion priors in the form of trajectory generators (TGs) is effective at efficiently learning complex locomotion skills. However, defining a good, single TG as tasks/environments become increasingly more complex remains a challenging problem as it requires extensive tuning and risks reducing the effectiveness of the prior. In this paper, we present Evolved Environmental Trajectory Generators (EETG), a method that learns a diverse set of specialised locomotion priors using Quality-Diversity algorithms while maintaining a single policy within the Policies Modulating TG (PMTG) architecture. The results demonstrate that EETG enables a quadruped robot to successfully traverse a wide range of environments, such as slopes, stairs, rough terrain, and balance beams. Our experiments show that learning a diverse set of specialized TG priors is significantly (5 times) more efficient than using a single, fixed prior when dealing with a wide range of environments.
翻译:基于数据驱动的学习方法近期在多种非结构化地形上学习鲁棒的运动控制器方面取得了显著成功。先前研究表明,以轨迹生成器(TG)形式融入良好的运动先验知识,有助于高效学习复杂的运动技能。然而,随着任务/环境日益复杂,如何定义单一且有效的轨迹生成器仍是一个具有挑战性的问题,因为它需要大量调参且可能降低先验的有效性。本文提出进化环境轨迹生成器(EETG)方法,该方法在策略调制轨迹生成器(PMTG)架构中维持单一策略的同时,利用质量多样性算法学习一组多样化的专用运动先验。实验结果表明,EETG使四足机器人能够成功穿越斜坡、楼梯、崎岖地形和平衡木等多种环境。我们的实验证明,在处理广泛环境时,学习多样化的专用TG先验比使用单一固定先验效率显著提高(达5倍)。