Autonomous self-improving robots that interact and improve with experience are key to the real-world deployment of robotic systems. In this paper, we propose an online learning method, SELFI, that leverages online robot experience to rapidly fine-tune pre-trained control policies efficiently. SELFI applies online model-free reinforcement learning on top of offline model-based learning to bring out the best parts of both learning paradigms. Specifically, SELFI stabilizes the online learning process by incorporating the same model-based learning objective from offline pre-training into the Q-values learned with online model-free reinforcement learning. We evaluate SELFI in multiple real-world environments and report improvements in terms of collision avoidance, as well as more socially compliant behavior, measured by a human user study. SELFI enables us to quickly learn useful robotic behaviors with less human interventions such as pre-emptive behavior for the pedestrians, collision avoidance for small and transparent objects, and avoiding travel on uneven floor surfaces. We provide supplementary videos to demonstrate the performance of our fine-tuned policy on our project page.
翻译:能够自主交互并随着经验持续改进的机器人是机器人系统实际部署的关键。本文提出一种在线学习方法SELFI,该方法利用机器人在线经验高效微调预训练控制策略。SELFI将离线模型学习之上的在线无模型强化学习相结合,融合两种学习范式的优势。具体而言,SELFI通过将离线预训练中的模型学习目标整合到在线无模型强化学习的Q值计算中,从而稳定在线学习过程。我们在多个真实环境中的评估表明,该方法在碰撞规避及符合社交规范行为方面均有改进,并通过人类用户研究进行了量化。SELFI使我们能够快速习得实用的机器人行为,且减少人工干预,例如对行人的预判行为、对小型透明物体的避障、以及避免在不平整地面行驶。项目页面中提供了补充视频以展示微调策略的性能。