Explainable machine learning provides tools to better understand predictive models and their decisions, but many such methods are limited to producing insights with respect to a single class. When generating explanations for several classes, reasoning over them to obtain a complete view may be difficult since they can present competing or contradictory evidence. To address this issue we introduce a novel paradigm of multi-class explanations. We outline the theory behind such techniques and propose a local surrogate model based on multi-output regression trees -- called LIMEtree -- which offers faithful and consistent explanations of multiple classes for individual predictions while being post-hoc, model-agnostic and data-universal. In addition to strong fidelity guarantees, our implementation supports (interactive) customisation of the explanatory insights and delivers a range of diverse explanation types, including counterfactual statements favoured in the literature. We evaluate our algorithm with a collection of quantitative experiments, a qualitative analysis based on explainability desiderata and a preliminary user study on an image classification task, comparing it to LIME. Our contributions demonstrate the benefits of multi-class explanations and wide-ranging advantages of our method across a diverse set scenarios.
翻译:可解释机器学习提供了更好地理解预测模型及其决策的工具,但许多此类方法仅限于生成单一类别的见解。当为多个类别生成解释时,由于这些解释可能呈现相互竞争或矛盾的证据,对它们进行推理以获得完整视图变得困难。为解决这一问题,我们引入了一种多类别解释的新范式。我们概述了此类技术背后的理论,并提出了一种基于多输出回归树的局部替代模型——称为 LIMEtree——该模型能够为个体预测提供多个类别的忠实且一致的解释,同时具有事后性、模型无关性和数据通用性。除了强大的保真度保证外,我们的实现支持(交互式)定制化解释见解,并提供多种多样的解释类型,包括文献中备受青睐的反事实陈述。我们通过一系列定量实验、基于可解释性准则的定性分析以及针对图像分类任务的初步用户研究评估了我们的算法,并将其与LIME进行了比较。我们的贡献证明了多类别解释的优势,以及我们的方法在各种场景中的广泛优越性。