Explainable AI (XAI) aims to answer ethical and legal questions associated with the deployment of AI models. However, a considerable number of domain-specific reviews highlight the need of a mathematical foundation for the key notions in the field, considering that even the term "explanation" still lacks a precise definition. These reviews also advocate for a sound and unifying formalism for explainable AI, to avoid the emergence of ill-posed questions, and to help researchers navigate a rapidly growing body of knowledge. To the authors knowledge, this paper is the first attempt to fill this gap by formalizing a unifying theory of XAI. Employing the framework of category theory, and feedback monoidal categories in particular, we first provide formal definitions for all essential terms in explainable AI. Then we propose a taxonomy of the field following the proposed structure, showing how the introduced theory can be used to categorize all the main classes of XAI systems currently studied in literature. In summary, the foundation of XAI proposed in this paper represents a significant tool to properly frame future research lines, and a precious guidance for new researchers approaching the field.
翻译:可解释人工智能(XAI)旨在解决人工智能模型部署所引发的伦理与法律问题。然而,大量针对特定领域的综述研究指出,该领域核心概念亟需数学基础支撑——即便是"解释"这一术语仍缺乏精确的定义。这些综述同时呼吁建立严谨统一的可解释人工智能形式化体系,以避免产生不当的伪命题,并帮助研究者驾驭快速增长的知识体系。据作者所知,本文首次尝试通过形式化XAI统一理论来填补这一空白。借助范畴论框架,特别是反馈幺半范畴,我们首先为可解释人工智能中所有关键术语提供了形式化定义,继而依据所提出的结构建立了该领域的分类体系,展示了引入的理论如何对文献中现有各类XAI系统进行系统归类。总而言之,本文提出的XAI基础理论为合理规划未来研究方向提供了重要工具,也为初涉该领域的研究者提供了宝贵指引。