Error detection in procedural activities is essential for consistent and correct outcomes in AR-assisted and robotic systems. Existing methods often focus on temporal ordering errors or rely on static prototypes to represent normal actions. However, these approaches typically overlook the common scenario where multiple, distinct actions are valid following a given sequence of executed actions. This leads to two issues: (1) the model cannot effectively detect errors using static prototypes when the inference environment or action execution distribution differs from training; and (2) the model may also use the wrong prototypes to detect errors if the ongoing action label is not the same as the predicted one. To address this problem, we propose an Adaptive Multiple Normal Action Representation (AMNAR) framework. AMNAR predicts all valid next actions and reconstructs their corresponding normal action representations, which are compared against the ongoing action to detect errors. Extensive experiments demonstrate that AMNAR achieves state-of-the-art performance, highlighting the effectiveness of AMNAR and the importance of modeling multiple valid next actions in error detection. The code is available at https://github.com/iSEE-Laboratory/AMNAR.
翻译:在增强现实辅助与机器人系统中,流程活动的错误检测对于保证结果的一致性与正确性至关重要。现有方法通常聚焦于时序顺序错误,或依赖静态原型来表征正常动作。然而,这些方法普遍忽视了在给定已执行动作序列后,存在多个不同但均有效的后续动作这一常见场景。这导致两个问题:(1) 当推理环境或动作执行分布与训练数据不同时,模型无法利用静态原型有效检测错误;(2) 若当前执行的动作标签与预测标签不一致,模型亦可能使用错误的原型进行错误检测。为解决此问题,我们提出了一种自适应多重正常动作表征(AMNAR)框架。AMNAR 预测所有有效的后续动作,并重构其对应的正常动作表征,通过将这些表征与当前执行的动作进行比较以实现错误检测。大量实验表明,AMNAR 取得了最先进的性能,凸显了该框架的有效性以及对多重有效后续动作进行建模在错误检测中的重要性。代码发布于 https://github.com/iSEE-Laboratory/AMNAR。