We introduce a new task called Adaptable Error Detection (AED), which aims to identify behavior errors in few-shot imitation (FSI) policies based on visual observations in novel environments. The potential to cause serious damage to surrounding areas limits the application of FSI policies in real-world scenarios. Thus, a robust system is necessary to notify operators when FSI policies are inconsistent with the intent of demonstrations. This task introduces three challenges: (1) detecting behavior errors in novel environments, (2) identifying behavior errors that occur without revealing notable changes, and (3) lacking complete temporal information of the rollout due to the necessity of online detection. However, the existing benchmarks cannot support the development of AED because their tasks do not present all these challenges. To this end, we develop a cross-domain AED benchmark, consisting of 322 base and 153 novel environments. Additionally, we propose Pattern Observer (PrObe) to address these challenges. PrObe is equipped with a powerful pattern extractor and guided by novel learning objectives to parse discernible patterns in the policy feature representations of normal or error states. Through our comprehensive evaluation, PrObe demonstrates superior capability to detect errors arising from a wide range of FSI policies, consistently surpassing strong baselines. Moreover, we conduct detailed ablations and a pilot study on error correction to validate the effectiveness of the proposed architecture design and the practicality of the AED task, respectively.
翻译:本文提出了一项名为自适应错误检测(AED)的新任务,旨在基于视觉观测识别新环境中小样本模仿(FSI)策略的行为错误。由于可能对周边环境造成严重损害,FSI策略在实际场景中的应用受到限制。因此,需要构建一个鲁棒的系统,在FSI策略与演示意图不一致时及时通知操作者。该任务面临三重挑战:(1)在新环境中检测行为错误;(2)识别无明显变化表征的行为错误;(3)因在线检测需求导致缺乏完整的轨迹时序信息。然而,现有基准测试无法支持AED的发展,因其任务设计未能涵盖全部挑战。为此,我们构建了一个跨领域AED基准测试集,包含322个基础环境和153个新环境。此外,我们提出模式观察器(PrObe)以应对这些挑战。PrObe配备强大的模式提取器,并通过新颖的学习目标引导,以解析策略特征表示中正常状态或错误状态的可辨识模式。综合评估表明,PrObe在检测各类FSI策略产生的错误方面表现出卓越能力,持续超越现有强基线模型。此外,我们通过详细的消融实验和纠错试点研究,分别验证了所提架构设计的有效性及AED任务的实际应用价值。