Mobile robots are essential in applications such as autonomous delivery and hospitality services. Applying learning-based methods to address mobile robot tasks has gained popularity due to its robustness and generalizability. Traditional methods such as Imitation Learning (IL) and Reinforcement Learning (RL) offer adaptability but require large datasets, carefully crafted reward functions, and face sim-to-real gaps, making them challenging for efficient and safe real-world deployment. We propose an online human-in-the-loop learning method PVP4Real that combines IL and RL to address these issues. PVP4Real enables efficient real-time policy learning from online human intervention and demonstration, without reward or any pretraining, significantly improving data efficiency and training safety. We validate our method by training two different robots -- a legged quadruped, and a wheeled delivery robot -- in two mobile robot tasks, one of which even uses raw RGBD image as observation. The training finishes within 15 minutes. Our experiments show the promising future of human-in-the-loop learning in addressing the data efficiency issue in real-world robotic tasks. More information is available at: https://metadriverse.github.io/pvp4real/
翻译:移动机器人在自主配送和酒店服务等应用中至关重要。基于学习的方法因其鲁棒性和泛化能力,在解决移动机器人任务中日益流行。传统的模仿学习(IL)和强化学习(RL)方法虽具适应性,但需要大量数据集、精心设计的奖励函数,并面临仿真到现实的差距,使其难以高效、安全地在现实世界部署。我们提出了一种在线人机协同学习方法PVP4Real,该方法结合了IL和RL以解决这些问题。PVP4Real能够通过在线人类干预和示范,无需奖励函数或任何预训练,即可实现高效的实时策略学习,显著提升了数据效率和训练安全性。我们通过训练两种不同的机器人——四足腿式机器人和轮式配送机器人——在两项移动机器人任务中验证了我们的方法,其中一项任务甚至使用原始RGBD图像作为观测。训练在15分钟内完成。我们的实验展示了人机协同学习在解决现实世界机器人任务中数据效率问题上的广阔前景。更多信息请访问:https://metadriverse.github.io/pvp4real/