We investigate a formalism for the conditions of a successful explanation of AI. We consider "success" to depend not only on what information the explanation contains, but also on what information the human explainee understands from it. Theory of mind literature discusses the folk concepts that humans use to understand and generalize behavior. We posit that folk concepts of behavior provide us with a "language" that humans understand behavior with. We use these folk concepts as a framework of social attribution by the human explainee - the information constructs that humans are likely to comprehend from explanations - by introducing a blueprint for an explanatory narrative (Figure 1) that explains AI behavior with these constructs. We then demonstrate that many XAI methods today can be mapped to folk concepts of behavior in a qualitative evaluation. This allows us to uncover their failure modes that prevent current methods from explaining successfully - i.e., the information constructs that are missing for any given XAI method, and whose inclusion can decrease the likelihood of misunderstanding AI behavior.
翻译:我们研究了一种用于判定AI成功解释条件的规范化形式。我们认为“成功”不仅取决于解释所包含的信息,还取决于人类被解释者从中所理解的信息。心智理论文献探讨了人类用于理解和概括行为的民俗概念。我们提出,行为民俗概念为人类提供了一种理解行为的“语言”。我们通过引入解释性叙事蓝图(图1),将行为民俗概念作为人类被解释者进行社会归因的框架——即人类可能从解释中理解的信息构念——该蓝图利用这些构念解释AI行为。随后,我们通过定性评估证明,当今许多XAI方法可映射至行为民俗概念。这使我们能够揭示当前方法无法成功解释的失败模式——即任何给定XAI方法所缺失的信息构念,而补全这些构念可降低误解AI行为的可能性。