Humanoid robots that can autonomously operate in diverse environments have the potential to help address labour shortages in factories, assist elderly at homes, and colonize new planets. While classical controllers for humanoid robots have shown impressive results in a number of settings, they are challenging to generalize and adapt to new environments. Here, we present a fully learning-based approach for real-world humanoid locomotion. Our controller is a causal transformer that takes the history of proprioceptive observations and actions as input and predicts the next action. We hypothesize that the observation-action history contains useful information about the world that a powerful transformer model can use to adapt its behavior in-context, without updating its weights. We train our model with large-scale model-free reinforcement learning on an ensemble of randomized environments in simulation and deploy it to the real world zero-shot. Our controller can walk over various outdoor terrains, is robust to external disturbances, and can adapt in context.
翻译:能够在多样化环境中自主运行的人形机器人有望帮助解决工厂劳动力短缺、协助居家老人,并开拓新行星。尽管人形机器人的经典控制器在多种场景中展现了令人瞩目的成果,但其难以泛化并适应新环境。本文提出了一种完全基于学习的真实世界人形机器人运动控制方法。该控制器采用一种因果Transformer架构,以本体感受观测与动作的历史数据作为输入,并预测下一动作。我们假设观测-动作历史中包含关于世界的有效信息,而强大的Transformer模型可通过上下文自适应行为(无需更新权重)来利用这些信息。通过在仿真中基于大规模无模型强化学习对一组随机化环境进行训练,我们实现了零样本迁移至真实世界。该控制器能够在多种户外地形上行走,具备对外部扰动的鲁棒性,并具备上下文自适应能力。