Black box neural networks are an indispensable part of modern robots. Nevertheless, deploying such high-stakes systems in real-world scenarios poses significant challenges when the stakeholders, such as engineers and legislative bodies, lack insights into the neural networks' decision-making process. Presently, explainable AI is primarily tailored to natural language processing and computer vision, falling short in two critical aspects when applied in robots: grounding in decision-making tasks and the ability to assess trustworthiness of their explanations. In this paper, we introduce a trustworthy explainable robotics technique based on human-interpretable, high-level concepts that attribute to the decisions made by the neural network. Our proposed technique provides explanations with associated uncertainty scores for the explanation by matching neural network's activations with human-interpretable visualizations. To validate our approach, we conducted a series of experiments with various simulated and real-world robot decision-making models, demonstrating the effectiveness of the proposed approach as a post-hoc, human-friendly robot diagnostic tool.
翻译:黑盒神经网络是现代机器人不可或缺的组成部分。然而,当工程师和立法机构等利益相关者缺乏对神经网络决策过程的洞察时,在现实场景中部署此类高风险系统会带来重大挑战。目前,可解释人工智能主要针对自然语言处理和计算机视觉领域定制,在应用于机器人时存在两个关键不足:决策任务的落地性以及对其解释可信度的评估能力。本文提出一种基于人类可解释高层概念的可信解释机器人技术,该技术能够归因于神经网络所作决策。我们提出的技术通过将神经网络激活与人类可解释的可视化进行匹配,为解释提供附带的不确定性评分。为验证方法有效性,我们使用多种模拟和真实世界机器人决策模型进行了一系列实验,证明所提方法作为事后、人性化的机器人诊断工具具有显著效果。