Towards Compositional Interpretability for XAI

Artificial intelligence (AI) is currently based largely on black-box machine learning models which lack interpretability. The field of eXplainable AI (XAI) strives to address this major concern, being critical in high-stakes areas such as the finance, legal and health sectors. We present an approach to defining AI models and their interpretability based on category theory. For this we employ the notion of a compositional model, which sees a model in terms of formal string diagrams which capture its abstract structure together with its concrete implementation. This comprehensive view incorporates deterministic, probabilistic and quantum models. We compare a wide range of AI models as compositional models, including linear and rule-based models, (recurrent) neural networks, transformers, VAEs, and causal and DisCoCirc models. Next we give a definition of interpretation of a model in terms of its compositional structure, demonstrating how to analyse the interpretability of a model, and using this to clarify common themes in XAI. We find that what makes the standard 'intrinsically interpretable' models so transparent is brought out most clearly diagrammatically. This leads us to the more general notion of compositionally-interpretable (CI) models, which additionally include, for instance, causal, conceptual space, and DisCoCirc models. We next demonstrate the explainability benefits of CI models. Firstly, their compositional structure may allow the computation of other quantities of interest, and may facilitate inference from the model to the modelled phenomenon by matching its structure. Secondly, they allow for diagrammatic explanations for their behaviour, based on influence constraints, diagram surgery and rewrite explanations. Finally, we discuss many future directions for the approach, raising the question of how to learn such meaningfully structured models in practice.

翻译：当前的人工智能主要基于缺乏可解释性的黑箱机器学习模型。可解释人工智能领域致力于解决这一关键问题，这在金融、法律和医疗等高风险领域尤为重要。我们提出一种基于范畴论来定义人工智能模型及其可解释性的方法。为此，我们采用组合模型的概念，将模型视为形式化字符串图，这些图同时捕捉模型的抽象结构及其具体实现。这种综合视角涵盖了确定性、概率性和量子模型。我们将多种人工智能模型作为组合模型进行比较，包括线性和基于规则的模型、（循环）神经网络、Transformer、VAE，以及因果模型和DisCoCirc模型。接着，我们基于模型的组合结构给出了解释的定义，展示了如何分析模型的可解释性，并借此阐明可解释人工智能领域的常见主题。我们发现，标准“本质可解释”模型的透明特性在图表表示中体现得最为清晰。这引导我们提出更具一般性的组合可解释模型概念，该范畴还包括因果模型、概念空间模型和DisCoCirc模型等。我们进一步论证了组合可解释模型在可解释性方面的优势。首先，其组合结构可能支持计算其他关注量，并可通过结构匹配促进从模型到建模现象的推理。其次，这类模型允许基于影响约束、图手术和重写解释的图表化行为解释。最后，我们探讨了该方法的多个未来发展方向，提出了如何在实践中学习这种具有意义结构的模型的问题。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日