A command-following robot that serves people in everyday life must continually improve itself in deployment domains with minimal help from its end users, instead of engineers. Previous methods are either difficult to continuously improve after the deployment or require a large number of new labels during fine-tuning. Motivated by (self-)supervised contrastive learning, we propose a novel representation that generates an intrinsic reward function for command-following robot tasks by associating images with sound commands. After the robot is deployed in a new domain, the representation can be updated intuitively and data-efficiently by non-experts without any hand-crafted reward functions. We demonstrate our approach on various sound types and robotic tasks, including navigation and manipulation with raw sensor inputs. In simulated and real-world experiments, we show that our system can continually self-improve in previously unseen scenarios given fewer new labeled data, while still achieving better performance over previous methods.
翻译:服务于日常生活的指令跟随机器人必须在部署环境中持续自我改进,且仅需终端用户(而非工程师)提供最少辅助。现有方法在部署后难以持续优化,或在微调时需要大量新标注数据。受(自)监督对比学习启发,我们提出一种新型表示方法,通过关联图像与声音指令,为指令跟随机器人任务生成内在奖励函数。当机器人部署至新环境后,非专家用户无需人工设计奖励函数即可直观且数据高效地更新该表示。我们在多种声音类型及机器人任务(包括基于原始传感器输入的导航与操作)中验证了该方法。模拟及真实环境实验表明,本系统能在更少新标注数据条件下,于未见场景中持续自我优化,同时相较于现有方法获得更优性能。