Characterizing the Interpretability of Attention Maps in Digital Pathology

Interpreting machine learning model decisions is crucial for high-risk applications like healthcare. In digital pathology, large whole slide images (WSIs) are decomposed into smaller tiles and tile-derived features are processed by attention-based multiple instance learning (ABMIL) models to predict WSI-level labels. These networks generate tile-specific attention weights, which can be visualized as attention maps for interpretability. However, a standardized evaluation framework for these maps is lacking, questioning their reliability and ability to detect spurious correlations that can mislead models. We herein propose a framework to assess the ability of attention networks to attend to relevant features in digital pathology by creating artificial model confounders and using dedicated interpretability metrics. Models are trained and evaluated on data with tile modifications correlated with WSI labels, enabling the analysis of model sensitivity to artificial confounders and the accuracy of attention maps in highlighting them. Confounders are introduced either through synthetic tile modifications or through tile ablations based on their specific image-based features, with the latter being used to assess more clinically relevant scenarios. We also analyze the impact of varying confounder quantities at both the tile and WSI levels. Our results show that ABMIL models perform as desired within our framework. While attention maps generally highlight relevant regions, their robustness is affected by the type and number of confounders. Our versatile framework has the potential to be used in the evaluation of various methods and the exploration of image-based features driving model predictions, which could aid in biomarker discovery.

翻译：在医疗等高风险应用中，解释机器学习模型的决策至关重要。在数字病理学中，大型全切片图像（WSI）被分解为较小的图块，基于注意力的多实例学习（ABMIL）模型处理这些图块提取的特征以预测WSI级别的标签。这些网络生成图块特定的注意力权重，可将其可视化为注意力图以提供可解释性。然而，目前缺乏对这些图进行标准化评估的框架，这对其可靠性及检测可能误导模型的伪相关性的能力提出了质疑。本文提出一个框架，通过创建人工模型混杂因素并采用专门的可解释性度量指标，评估注意力网络在数字病理学中关注相关特征的能力。我们在图块修改与WSI标签相关的数据上训练和评估模型，从而分析模型对人工混杂因素的敏感性以及注意力图在突出这些混杂因素时的准确性。混杂因素的引入方式包括合成图块修改或基于特定图像特征的图块消融，后者用于评估更具临床相关性的场景。我们还分析了图块和WSI两个层面上混杂因素数量变化的影响。结果表明，ABMIL模型在我们的框架内表现符合预期。虽然注意力图通常能突出相关区域，但其鲁棒性受混杂因素类型和数量的影响。我们提出的多功能框架有望用于评估多种方法，并探索驱动模型预测的图像特征，从而助力生物标志物的发现。