Ensuring safety and reliability in human-robot interaction (HRI) requires the timely detection of unexpected events that could lead to system failures or unsafe behaviours. Anomaly detection thus plays a critical role in enabling robots to recognize and respond to deviations from normal operation during collaborative tasks. While reconstruction models have been actively explored in HRI, approaches that operate directly on feature vectors remain largely unexplored. In this work, we propose MADRI, a framework that first transforms video streams into semantically meaningful feature vectors before performing reconstruction-based anomaly detection. Additionally, we augment these visual feature vectors with the robot's internal sensors' readings and a Scene Graph, enabling the model to capture both external anomalies in the visual environment and internal failures within the robot itself. To evaluate our approach, we collected a custom dataset consisting of a simple pick-and-place robotic task under normal and anomalous conditions. Experimental results demonstrate that reconstruction on vision-based feature vectors alone is effective for detecting anomalies, while incorporating other modalities further improves detection performance, highlighting the benefits of multimodal feature reconstruction for robust anomaly detection in human-robot collaboration.
翻译:确保人机交互(HRI)中的安全性与可靠性,需要及时检测可能导致系统故障或危险行为的意外事件。因此,异常检测在使机器人能够识别并响应协作任务中的正常操作偏差中发挥着关键作用。尽管重构模型已在人机交互领域得到积极探索,但直接对特征向量进行操作的方法仍鲜有研究。本文提出MADRI框架,该框架首先将视频流转化为具有语义意义的特征向量,再执行基于重构的异常检测。此外,我们将这些视觉特征向量与机器人内部传感器的读数及场景图相结合,使模型既能捕捉视觉环境中的外部异常,也能检测机器人自身的内部故障。为评估所提方法,我们收集了一个自定义数据集,其中包含正常与异常条件下完成的简单拾放机器人任务。实验结果表明,仅基于视觉特征向量的重构即可有效检测异常,而融合其他模态可进一步提升检测性能,这凸显了多模态特征重构在实现稳健的人机协作异常检测中的优势。