Counterfactual explanations (CEs) aim to enhance the interpretability of machine learning models by illustrating how alterations in input features would affect the resulting predictions. Common CE approaches require an additional model and are typically constrained to binary counterfactuals. In contrast, we propose a novel method that operates directly on the latent space of a generative model, specifically a Diffusion Autoencoder (DAE). This approach offers inherent interpretability by enabling the generation of CEs and the continuous visualization of the model's internal representation across decision boundaries. Our method leverages the DAE's ability to encode images into a semantically rich latent space in an unsupervised manner, eliminating the need for labeled data or separate feature extraction models. We show that these latent representations are helpful for medical condition classification and the ordinal regression of severity pathologies, such as vertebral compression fractures (VCF) and diabetic retinopathy (DR). Beyond binary CEs, our method supports the visualization of ordinal CEs using a linear model, providing deeper insights into the model's decision-making process and enhancing interpretability. Experiments across various medical imaging datasets demonstrate the method's advantages in interpretability and versatility. The linear manifold of the DAE's latent space allows for meaningful interpolation and manipulation, making it a powerful tool for exploring medical image properties. Our code is available at https://doi.org/10.5281/zenodo.13859266.
翻译:反事实解释旨在通过阐释输入特征的改变如何影响预测结果,从而增强机器学习模型的可解释性。常见的反事实解释方法需要额外模型,且通常局限于二元反事实。相比之下,我们提出一种直接在生成模型潜在空间(特别是扩散自编码器)中操作的新方法。该方法通过生成反事实解释并连续可视化模型在决策边界内的内部表征,提供了固有的可解释性。我们的方法利用扩散自编码器以无监督方式将图像编码到语义丰富的潜在空间的能力,无需标注数据或独立的特征提取模型。研究表明,这些潜在表征有助于医学状况分类及严重程度病理(如椎体压缩性骨折和糖尿病视网膜病变)的序数回归。除二元反事实外,本方法支持使用线性模型实现序数反事实的可视化,为模型决策过程提供更深入的洞察并增强可解释性。跨多个医学影像数据集的实验证明了该方法在可解释性和多功能性方面的优势。扩散自编码器潜在空间的线性流形允许进行有意义的插值与操作,使其成为探索医学图像特性的强大工具。代码发布于 https://doi.org/10.5281/zenodo.13859266。