One promising approach towards effective robot decision making in complex, long-horizon tasks is to sequence together parameterized skills. We consider a setting where a robot is initially equipped with (1) a library of parameterized skills, (2) an AI planner for sequencing together the skills given a goal, and (3) a very general prior distribution for selecting skill parameters. Once deployed, the robot should rapidly and autonomously learn to improve its performance by specializing its skill parameter selection policy to the particular objects, goals, and constraints in its environment. In this work, we focus on the active learning problem of choosing which skills to practice to maximize expected future task success. We propose that the robot should estimate the competence of each skill, extrapolate the competence (asking: "how much would the competence improve through practice?"), and situate the skill in the task distribution through competence-aware planning. This approach is implemented within a fully autonomous system where the robot repeatedly plans, practices, and learns without any environment resets. Through experiments in simulation, we find that our approach learns effective parameter policies more sample-efficiently than several baselines. Experiments in the real-world demonstrate our approach's ability to handle noise from perception and control and improve the robot's ability to solve two long-horizon mobile-manipulation tasks after a few hours of autonomous practice.
翻译:一种实现机器人在复杂、长时域任务中有效决策的有前景方法是串联参数化技能。我们考虑初始配备以下资源的机器人场景:(1)参数化技能库,(2)用于根据目标串联技能的人工智能规划器,(3)用于选择技能参数的通用先验分布。部署后,机器人应通过针对环境中特定物体、目标和约束专门定制其技能参数选择策略,快速自主提升性能。本研究聚焦主动学习问题——选择哪些技能进行练习以最大化未来任务成功率。我们提出机器人应:(1)评估每项技能的能力水平;(2)外推能力提升潜力(即“通过练习能提升多少能力?”);(3)通过能力感知规划将技能嵌入任务分布。该方法在无需环境重置的完全自主系统中实现,机器人可循环执行规划、练习与学习。仿真实验表明,该方法在样本效率上优于多种基线方法。真实环境实验验证了该框架对感知与控制噪声的鲁棒性,并在数小时自主练习后显著提升机器人完成两项长时域移动操作任务的能力。