Autonomous self-improving robots that interact and improve with experience are key to the real-world deployment of robotic systems. In this paper, we propose an online learning method, SELFI, that leverages online robot experience to rapidly fine-tune pre-trained control policies efficiently. SELFI applies online model-free reinforcement learning on top of offline model-based learning to bring out the best parts of both learning paradigms. Specifically, SELFI stabilizes the online learning process by incorporating the same model-based learning objective from offline pre-training into the Q-values learned with online model-free reinforcement learning. We evaluate SELFI in multiple real-world environments and report improvements in terms of collision avoidance, as well as more socially compliant behavior, measured by a human user study. SELFI enables us to quickly learn useful robotic behaviors with less human interventions such as pre-emptive behavior for the pedestrians, collision avoidance for small and transparent objects, and avoiding travel on uneven floor surfaces. We provide supplementary videos to demonstrate the performance of our fine-tuned policy on our project page.
翻译:能够通过交互与经验实现自我改进的自主机器人,是机器人系统在现实世界部署的关键。本文提出一种在线学习方法SELFI,该方法利用在线机器人经验高效地快速微调预训练控制策略。SELFI在基于模型的离线学习基础上,应用无模型在线强化学习,从而融合两种学习范式的优势。具体而言,SELFI通过将离线预训练阶段相同的基于模型学习目标,融入通过无模型在线强化学习习得的Q值中,从而稳定在线学习过程。我们在多个真实世界环境中评估SELFI,并报告了其在避碰方面的改进,以及通过人工用户研究衡量的、更具社交合规性的行为。SELFI使我们能够以较少的人工干预快速学习有用的机器人行为,例如针对行人的预判行为、针对小型及透明物体的避碰,以及避免在不平坦的地面上行进。我们在项目页面上提供了补充视频,以展示微调后策略的性能。