Deep reinforcement learning has recently emerged as an appealing alternative for legged locomotion over multiple terrains by training a policy in physical simulation and then transferring it to the real world (i.e., sim-to-real transfer). Despite considerable progress, the capacity and scalability of traditional neural networks are still limited, which may hinder their applications in more complex environments. In contrast, the Transformer architecture has shown its superiority in a wide range of large-scale sequence modeling tasks, including natural language processing and decision-making problems. In this paper, we propose Terrain Transformer (TERT), a high-capacity Transformer model for quadrupedal locomotion control on various terrains. Furthermore, to better leverage Transformer in sim-to-real scenarios, we present a novel two-stage training framework consisting of an offline pretraining stage and an online correction stage, which can naturally integrate Transformer with privileged training. Extensive experiments in simulation demonstrate that TERT outperforms state-of-the-art baselines on different terrains in terms of return, energy consumption and control smoothness. In further real-world validation, TERT successfully traverses nine challenging terrains, including sand pit and stair down, which can not be accomplished by strong baselines.
翻译:深度强化学习近来成为多地形足式机器人运动控制的一种有前景的替代方案,其通过先在物理仿真中训练策略,再将其迁移至现实世界(即仿真到现实迁移)。尽管取得了显著进展,传统神经网络的容量和可扩展性仍存在局限,这可能阻碍其在更复杂环境中的应用。相比之下,Transformer架构在自然语言处理和决策问题等大规模序列建模任务中展现出卓越优势。本文提出Terrain Transformer(TERT)——一种用于多地形四足机器人运动控制的高容量Transformer模型。此外,为更好地在仿真到现实场景中利用Transformer,我们提出了一种新颖的两阶段训练框架,包含离线预训练阶段和在线校正阶段,可自然地将Transformer与特权训练相结合。大量仿真实验表明,在回报值、能耗和控制平滑性方面,TERT在不同地形上均优于现有最优基线方法。在进一步的真实世界验证中,TERT成功穿越了包括沙坑和下楼梯在内的九种挑战性地形,而强基线方法无法完成这些任务。