Explainable AI (XAI) has unfolded in two distinct research directions with, on the one hand, post-hoc methods that explain the predictions of a pre-trained black-box model and, on the other hand, self-explainable models (SEMs) which are trained directly to provide explanations alongside their predictions. While the latter is preferred in most safety-critical scenarios, post-hoc approaches have received the majority of attention until now, owing to their simplicity and ability to explain base models without retraining. Current SEMs instead, require complex architectures and heavily regularized loss functions, thus necessitating specific and costly training. To address this shortcoming and facilitate wider use of SEMs, we propose a simple yet efficient universal method called KMEx (K-Means Explainer), which can convert any existing pre-trained model into a prototypical SEM. The motivation behind KMEx is to push towards more transparent deep learning-based decision-making via class-prototype-based explanations that are guaranteed to be diverse and trustworthy without retraining the base model. We compare models obtained from KMEx to state-of-the-art SEMs using an extensive qualitative evaluation to highlight the strengths and weaknesses of each model, further paving the way toward a more reliable and objective evaluation of SEMs.
翻译:可解释人工智能(XAI)已发展出两个截然不同的研究方向:其一是事后解释方法,用于解释预训练黑盒模型的预测结果;其二是自解释模型(SEMs),这类模型在训练时直接生成预测及其配套解释。尽管后者在大多数安全关键场景中更受青睐,但事后方法凭借其简单性以及无需重新训练即可解释基础模型的能力,迄今仍占据主流关注。而现有SEMs则需要复杂的架构和高度正则化的损失函数,因此必须进行特定且昂贵的训练。为解决这一缺陷并促进SEMs的更广泛使用,我们提出一种简单高效的通用方法KMEx(K-Means Explainer),该方法可将任意现有预训练模型转化为原型自解释模型。KMEx的设计动机在于通过基于类别原型的解释推动更透明的深度学习决策,这些解释无需重新训练基础模型即可保证多样性与可信度。我们通过大规模定性评估,将KMEx生成的模型与现有最先进SEMs进行对比,突出各模型的优缺点,进一步为SEMs更可靠、更客观的评估奠定基础。