Explainable artificial intelligence (XAI) aims to help uncover flaws in an AI model's internal representations. But do people draw the right conclusions from its explanations? Specifically, do they recognize an AI's inability to distinguish between relevant and irrelevant features? In the present study, a simulated AI classified images of railway trespassers as dangerous or not. To explain which features it has used, other images from the dataset were shown that activate the AI in a similar way. These concept images varied in three relevant features (i.e., a person's distance to the tracks, direction, and action) and in an irrelevant feature (i.e., scene background). When the AI uses a feature in its decision, this feature is retained in the concept images, otherwise the images randomize over it (e.g., same distance, varied backgrounds). Participants rated the AI more favorably when it retained relevant features. For the irrelevant feature, they did not mind in general, and sometimes even preferred it to be retained. This suggests that people may not recognize it when an AI model relies on irrelevant features to make its decisions.
翻译:可解释人工智能(XAI)旨在帮助揭示人工智能模型内部表征的缺陷。然而,人们是否能从其解释中得出正确的结论?具体而言,他们能否识别出人工智能无法区分相关特征与无关特征的能力缺陷?在本研究中,一个模拟人工智能系统对铁路侵入者的图像进行了危险与否的分类。为解释其所使用的特征,研究展示了数据集中能类似方式激活该系统的其他图像。这些概念图像在三个相关特征(即人物与轨道的距离、行进方向和动作)以及一个无关特征(即场景背景)上存在差异。当人工智能在决策中使用某一特征时,该特征会在概念图像中保留,否则图像将在该特征上随机呈现(例如相同距离下背景各异)。参与者对保留相关特征的人工智能评价更高。对于无关特征,他们总体上并不介意,有时甚至更倾向于保留该特征。这表明人们可能无法识别人工智能模型依赖无关特征进行决策的情况。