We introduce a new task called Adaptable Error Detection (AED), which aims to identify behavior errors in few-shot imitation (FSI) policies based on visual observations in novel environments. The potential to cause serious damage to surrounding areas limits the application of FSI policies in real-world scenarios. Thus, a robust system is necessary to notify operators when FSI policies are inconsistent with the intent of demonstrations. This task introduces three challenges: (1) detecting behavior errors in novel environments, (2) identifying behavior errors that occur without revealing notable changes, and (3) lacking complete temporal information of the rollout due to the necessity of online detection. However, the existing benchmarks cannot support the development of AED because their tasks do not present all these challenges. To this end, we develop a cross-domain AED benchmark, consisting of 322 base and 153 novel environments. Additionally, we propose Pattern Observer (PrObe) to address these challenges. PrObe is equipped with a powerful pattern extractor and guided by novel learning objectives to parse discernible patterns in the policy feature representations of normal or error states. Through our comprehensive evaluation, PrObe demonstrates superior capability to detect errors arising from a wide range of FSI policies, consistently surpassing strong baselines. Moreover, we conduct detailed ablations and a pilot study on error correction to validate the effectiveness of the proposed architecture design and the practicality of the AED task, respectively.
翻译:我们提出了一项名为可适应错误检测(AED)的新任务,其目标是在新环境中基于视觉观测识别小样本模仿(FSI)策略中的行为错误。对周围环境可能造成严重损害的风险限制了FSI策略在现实场景中的应用。因此,需要一种鲁棒的系统,在FSI策略与演示意图不一致时通知操作者。该任务引入了三个挑战:(1)在新环境中检测行为错误;(2)识别未表现出显著变化的行为错误;(3)由于在线检测的必要性而缺乏完整的推演时序信息。然而,现有基准测试无法支持AED的发展,因为它们的任务并未同时呈现所有这些挑战。为此,我们开发了一个跨领域AED基准测试,包含322个基础环境和153个新环境。此外,我们提出了模式观察器(PrObe)以应对这些挑战。PrObe配备了一个强大的模式提取器,并通过新颖的学习目标引导,以解析策略特征表示中正常或错误状态的可辨别模式。通过全面评估,PrObe展示了检测多种FSI策略所产生错误的卓越能力,始终优于强基线方法。此外,我们进行了详细的消融实验和错误纠正的初步研究,分别验证了所提出架构设计的有效性和AED任务的实用性。