Recent progress in end-to-end Imitation Learning approaches has shown promising results and generalization capabilities on mobile manipulation tasks. Such models are seeing increasing deployment in real-world settings, where scaling up requires robots to be able to operate with high autonomy, i.e. requiring as little human supervision as possible. In order to avoid the need for one-on-one human supervision, robots need to be able to detect and prevent policy failures ahead of time, and ask for help, allowing a remote operator to supervise multiple robots and help when needed. However, the black-box nature of end-to-end Imitation Learning models such as Behavioral Cloning, as well as the lack of an explicit state-value representation, make it difficult to predict failures. To this end, we introduce Behavioral Cloning Value Approximation (BCVA), an approach to learning a state value function based on and trained jointly with a Behavioral Cloning policy that can be used to predict failures. We demonstrate the effectiveness of BCVA by applying it to the challenging mobile manipulation task of latched-door opening, showing that we can identify failure scenarios with with 86% precision and 81% recall, evaluated on over 2000 real world runs, improving upon the baseline of simple failure classification by 10 percentage-points.
翻译:近期,端到端模仿学习方法在移动操作任务上展现了显著的进展和泛化能力。这类模型在现实场景中的部署日益增多,而扩展应用规模要求机器人能够高度自主运行,即尽可能减少人工监督。为避免一对一的人工监督,机器人需能提前检测并预防策略失败,主动请求帮助,从而使远程操作员能同时监督多台机器人,并在必要时提供协助。然而,端到端模仿学习模型(如行为克隆)的黑箱特性以及缺乏显式状态价值表示,使得失败预测面临挑战。为此,我们提出行为克隆价值逼近(BCVA)方法——一种基于行为克隆策略并与之联合训练的状态价值函数学习方案,可用于预测失败。通过将其应用于具有挑战性的“闩门开启”移动操作任务,我们验证了BCVA的有效性:基于2000余次真实世界运行评估,该方法能以86%的精度和81%的召回率识别失败场景,较简单的失败分类基线提升了10个百分点。