Model-based RL is a promising approach for real-world robotics due to its improved sample efficiency and generalization capabilities compared to model-free RL. However, effective model-based RL solutions for vision-based real-world applications require bridging the sim-to-real gap for any world model learnt. Due to its significant computational cost, standard domain randomisation does not provide an effective solution to this problem. This paper proposes TWIST (Teacher-Student World Model Distillation for Sim-to-Real Transfer) to achieve efficient sim-to-real transfer of vision-based model-based RL using distillation. Specifically, TWIST leverages state observations as readily accessible, privileged information commonly garnered from a simulator to significantly accelerate sim-to-real transfer. Specifically, a teacher world model is trained efficiently on state information. At the same time, a matching dataset is collected of domain-randomised image observations. The teacher world model then supervises a student world model that takes the domain-randomised image observations as input. By distilling the learned latent dynamics model from the teacher to the student model, TWIST achieves efficient and effective sim-to-real transfer for vision-based model-based RL tasks. Experiments in simulated and real robotics tasks demonstrate that our approach outperforms naive domain randomisation and model-free methods in terms of sample efficiency and task performance of sim-to-real transfer.
翻译:基于模型的强化学习因其相比于无模型强化学习具有更高的样本效率和泛化能力,成为真实机器人领域一种有前景的方法。然而,针对基于视觉的真实世界应用,有效的基于模型强化学习解决方案需要弥合任何已学习世界模型的仿真到现实差距。由于计算成本过高,标准领域随机化无法为此问题提供有效解决方案。本文提出TWIST(面向仿真到现实迁移的师生世界模型蒸馏方法),通过蒸馏技术实现基于视觉的模型强化学习的仿真到现实高效迁移。具体而言,TWIST利用状态观测作为从仿真器中轻松获取的易于访问的特权信息,显著加速仿真到现实迁移。教师世界模型基于状态信息高效训练,同时收集与领域随机化图像观测匹配的数据集。随后,教师世界模型监督以领域随机化图像观测为输入的学生世界模型。通过将学习到的潜在动力学模型从教师模型蒸馏到学生模型,TWIST实现了基于视觉的模型强化学习任务的高效且有效的仿真到现实迁移。在仿真和真实机器人任务上的实验表明,本方法在仿真到现实迁移的样本效率和任务性能方面优于朴素领域随机化和无模型方法。