Operating under real world conditions is challenging due to the possibility of a wide range of failures induced by execution errors and state uncertainty. In relatively benign settings, such failures can be overcome by retrying or executing one of a small number of hand-engineered recovery strategies. By contrast, contact-rich sequential manipulation tasks, like opening doors and assembling furniture, are not amenable to exhaustive hand-engineering. To address this issue, we present a general approach for robustifying manipulation strategies in a sample-efficient manner. Our approach incrementally improves robustness by first discovering the failure modes of the current strategy via exploration in simulation and then learning additional recovery skills to handle these failures. To ensure efficient learning, we propose an online algorithm called Meta-Reasoning for Skill Learning (MetaReSkill) that monitors the progress of all recovery policies during training and allocates training resources to recoveries that are likely to improve the task performance the most. We use our approach to learn recovery skills for door-opening and evaluate them both in simulation and on a real robot with little fine-tuning. Compared to open-loop execution, our experiments show that even a limited amount of recovery learning improves task success substantially from 71% to 92.4% in simulation and from 75% to 90% on a real robot.
翻译:在真实世界条件下运行具有挑战性,因为执行错误和状态不确定性可能导致多种故障。在相对温和的环境中,此类故障可通过重试或执行少量手工设计的恢复策略来克服。然而,接触密集的序列操作任务(如开门和组装家具)难以通过穷举手工工程处理。为解决该问题,我们提出了一种通用方法,能够以样本高效的方式增强操作策略的鲁棒性。该方法通过以下步骤逐步提升鲁棒性:首先在仿真环境中通过探索发现当前策略的故障模式,然后学习额外的恢复技能以应对这些故障。为确保高效学习,我们提出了一种名为“技能学习的元推理”(MetaReSkill)的在线算法,该算法在训练过程中监控所有恢复策略的进展,并将训练资源分配给最可能提升任务性能的恢复策略。我们采用该方法学习开门的恢复技能,并在仿真环境和真实机器人上进行了评估(仅需少量微调)。与开环执行相比,实验表明,即使有限量的恢复学习也能显著提升任务成功率:在仿真中从71%提升至92.4%,在真实机器人上从75%提升至90%。