One promising approach towards effective robot decision making in complex, long-horizon tasks is to sequence together parameterized skills. We consider a setting where a robot is initially equipped with (1) a library of parameterized skills, (2) an AI planner for sequencing together the skills given a goal, and (3) a very general prior distribution for selecting skill parameters. Once deployed, the robot should rapidly and autonomously learn to improve its performance by specializing its skill parameter selection policy to the particular objects, goals, and constraints in its environment. In this work, we focus on the active learning problem of choosing which skills to practice to maximize expected future task success. We propose that the robot should estimate the competence of each skill, extrapolate the competence (asking: "how much would the competence improve through practice?"), and situate the skill in the task distribution through competence-aware planning. This approach is implemented within a fully autonomous system where the robot repeatedly plans, practices, and learns without any environment resets. Through experiments in simulation, we find that our approach learns effective parameter policies more sample-efficiently than several baselines. Experiments in the real-world demonstrate our approach's ability to handle noise from perception and control and improve the robot's ability to solve two long-horizon mobile-manipulation tasks after a few hours of autonomous practice. Project website: http://ees.csail.mit.edu
翻译:实现机器人在复杂长时域任务中有效决策的一个有前景的方法是,将参数化技能按顺序组合。我们考虑一个初始配备以下条件的机器人场景:(1)一个参数化技能库,(2)用于根据目标组合技能的人工智能规划器,以及(3)一个用于选择技能参数的非常通用的先验分布。在部署后,机器人应通过针对环境中特定物体、目标和约束,专精化其技能参数选择策略,快速且自主地学习提升性能。在本工作中,我们聚焦于主动学习问题:选择哪些技能进行练习,以最大化未来任务成功的期望。我们提出,机器人应评估每项技能的熟练度,外推其熟练度(询问:“通过练习,熟练度能提升多少?”),并通过考虑熟练度的规划将技能置于任务分布中。该方法在一个完全自主的系统中实现,机器人无需任何环境重置即可反复规划、练习和学习。通过模拟实验,我们发现我们的方法比多个基线方法更高效地学习有效的参数策略。真实世界实验表明,我们的方法能够处理感知和控制中的噪声,并在几个小时的自主练习后提升机器人解决两个长时域移动操作任务的能力。项目网站:http://ees.csail.mit.edu