Interpretability is essential for machine learning algorithms in high-stakes application fields such as medical image analysis. However, high-performing black-box neural networks do not provide explanations for their predictions, which can lead to mistrust and suboptimal human-ML collaboration. Post-hoc explanation techniques, which are widely used in practice, have been shown to suffer from severe conceptual problems. Furthermore, as we show in this paper, current explanation techniques do not perform adequately in the multi-label scenario, in which multiple medical findings may co-occur in a single image. We propose Attri-Net, an inherently interpretable model for multi-label classification. Attri-Net is a powerful classifier that provides transparent, trustworthy, and human-understandable explanations. The model first generates class-specific attribution maps based on counterfactuals to identify which image regions correspond to certain medical findings. Then a simple logistic regression classifier is used to make predictions based solely on these attribution maps. We compare Attri-Net to five post-hoc explanation techniques and one inherently interpretable classifier on three chest X-ray datasets. We find that Attri-Net produces high-quality multi-label explanations consistent with clinical knowledge and has comparable classification performance to state-of-the-art classification models.
翻译:可解释性对于医疗影像分析等高可靠性应用场景中的机器学习算法至关重要。然而,高性能黑箱神经网络无法为其预测提供解释,这可能导致信任缺失与人机协作效果欠佳。实际应用中广泛使用的后验解释技术已被证明存在严重概念性问题。此外,如本文所示,现有解释技术在多个医学征象可能共存于单张影像的多标签场景下表现不足。我们提出Attri-Net——一种用于多标签分类的固可解释模型。该模型既能作为强分类器,又能提供透明、可信且符合人类认知的解释。模型首先基于反事实推理生成类别特异性归因图,用以识别对应特定医学征象的图像区域;随后仅依据这些归因图使用逻辑回归分类器进行预测。我们在三个胸部X光数据集上将Attri-Net与五种后验解释技术及一种固可解释分类器进行对比。实验证明,Attri-Net能生成与临床知识一致的高质量多标签解释,其分类性能可媲美现有最优分类模型。