In self-supervised robot learning, robots actively explore their environments and generate data by acting on entities in the environment. Therefore, an exploration policy is desired that ensures sample efficiency to minimize robot execution costs while still providing accurate learning. For this purpose, the robotic community has adopted Intrinsic Motivation (IM)-based approaches such as Learning Progress (LP). On the machine learning front, Active Learning (AL) has been used successfully, especially for classification tasks. In this work, we develop a novel AL framework geared towards robotics regression tasks, such as action-effect prediction and, more generally, for world model learning, which we call MUSEL - Model Uncertainty for Sample Efficient Learning. MUSEL aims to extract model uncertainty from the total uncertainty estimate given by a suitable learning engine by making use of earning progress and input diversity and use it to improve sample efficiency beyond the state-of-the-art action-effect prediction methods. We demonstrate the feasibility of our model by using a Stochastic Variational Gaussian Process (SVGP) as the learning engine and testing the system on a set of robotic experiments in simulation. The efficacy of MUSEL is demonstrated by comparing its performance to standard methods used in robot action-effect learning. In a robotic tabletop environment in which a robot manipulator is tasked with learning the effect of its actions, the experiments show that MUSEL facilitates higher accuracy in learning action effects while ensuring sample efficiency.
翻译:在自监督机器人学习中,机器人主动探索其环境并通过作用于环境中的实体来生成数据。因此,需要一种探索策略来确保样本效率,以最小化机器人执行成本,同时仍能提供准确的学习。为此,机器人学界已采用基于内在动机的方法,如学习进度。在机器学习方面,主动学习已成功应用,特别是在分类任务中。在本工作中,我们开发了一种新颖的主动学习框架,专为机器人回归任务设计,例如动作效应预测,更广泛地说,用于世界模型学习,我们称之为MUSEL——样本高效学习的模型不确定性。MUSEL旨在通过利用学习进度和输入多样性,从合适的学习引擎给出的总不确定性估计中提取模型不确定性,并利用它来提高样本效率,超越当前最先进的行动效应预测方法。我们通过使用随机变分高斯过程作为学习引擎,并在模拟中的一系列机器人实验上测试系统,证明了我们模型的可行性。通过将MUSEL的性能与机器人动作效应学习中使用的标准方法进行比较,展示了其有效性。在一个机器人桌面环境中,机器人操纵器被赋予学习其动作效应的任务,实验表明,MUSEL在确保样本效率的同时,促进了学习动作效应的更高准确性。