We investigate a formalism for the conditions of a successful explanation of AI. We consider "success" to depend not only on what information the explanation contains, but also on what information the human explainee understands from it. Theory of mind literature discusses the folk concepts that humans use to understand and generalize behavior. We posit that folk concepts of behavior provide us with a "language" that humans understand behavior with. We use these folk concepts as a framework of *social attribution* by the human explainee -- the information constructs that humans are likely to comprehend from explanations -- by introducing a blueprint for an explanatory narrative (Figure 1) that explains AI behavior with these constructs. We then demonstrate that many XAI methods today can be mapped to folk concepts of behavior in a qualitative evaluation. This allows us to uncover their failure modes that prevent current methods from explaining successfully -- i.e., the information constructs that are missing for any given XAI method, and whose inclusion can decrease the likelihood of misunderstanding AI behavior.
翻译:我们研究了一种关于AI成功解释条件的规范化形式。我们认为“成功”不仅取决于解释所包含的信息,还取决于人类被解释者从中理解的信息。心智理论文献探讨了人类用于理解和泛化行为的民俗概念。我们提出,行为民俗概念为我们提供了一种人类理解行为的“语言”。我们将这些民俗概念作为人类被解释者进行*社会归因*的框架——即人类从解释中倾向于理解的信息构念——通过引入一个用这些构念解释AI行为的解释叙事蓝图(图1)。随后,我们在定性评估中证明,当今许多XAI方法可映射到行为民俗概念。这使我们能够揭示这些方法在解释成功方面的失效模式——即任何给定XAI方法所缺失的信息构念,以及这些构念的纳入如何降低误解AI行为的可能性。