The ability to learn and refine behavior after deployment has become ever more important for robots as we design them to operate in unstructured environments like households. In this work, we design a new learning system based on large language model (LLM), OLAF, that allows everyday users to teach a robot using verbal corrections when the robot makes mistakes, e.g., by saying "Stop what you're doing. You should move closer to the cup." A key feature of OLAF is its ability to update the robot's visuomotor neural policy based on the verbal feedback to avoid repeating mistakes in the future. This is in contrast to existing LLM-based robotic systems, which only follow verbal commands or corrections but not learn from them. We demonstrate the efficacy of our design in experiments where a user teaches a robot to perform long-horizon manipulation tasks both in simulation and on physical hardware, achieving on average 20.0% improvement in policy success rate. Videos and more results are at https://ut-austin-rpl.github.io/olaf/
翻译:在机器人被设计应用于家庭等非结构化环境的背景下,部署后学习和改进行为的能力变得愈发重要。在本研究中,我们设计了一个基于大语言模型(LLM)的新型学习系统OLAF,使普通用户能够在机器人犯错时通过口头纠正来教授机器人,例如,通过说“停止你正在做的事。你应该靠近杯子。”。OLAF的一个关键特性是能够根据口头反馈更新机器人的视觉运动神经策略,以避免在未来重复错误。这与现有的基于LLM的机器人系统形成对比,后者仅遵循口头命令或纠正,但不会从中学习。我们通过实验证明了我们的设计的有效性,在实验中,用户教授机器人执行长周期操作任务,无论是在仿真环境中还是在实际硬件上,策略成功率平均提高了20.0%。视频和更多结果请访问https://ut-austin-rpl.github.io/olaf/