Multiple Instance Learning (MIL) methods allow for gigapixel Whole-Slide Image (WSI) analysis with only slide-level annotations. Interpretability is crucial for safely deploying such algorithms in high-stakes medical domains. Traditional MIL methods offer explanations by highlighting salient regions. However, such spatial heatmaps provide limited insights for end users. To address this, we propose a novel inherently interpretable WSI-classification approach that uses human-understandable pathology concepts to generate explanations. Our proposed Concept MIL model leverages recent advances in vision-language models to directly predict pathology concepts based on image features. The model's predictions are obtained through a linear combination of the concepts identified on the top-K patches of a WSI, enabling inherent explanations by tracing each concept's influence on the prediction. In contrast to traditional concept-based interpretable models, our approach eliminates the need for costly human annotations by leveraging the vision-language model. We validate our method on two widely used pathology datasets: Camelyon16 and PANDA. On both datasets, Concept MIL achieves AUC and accuracy scores over 0.9, putting it on par with state-of-the-art models. We further find that 87.1\% (Camelyon16) and 85.3\% (PANDA) of the top 20 patches fall within the tumor region. A user study shows that the concepts identified by our model align with the concepts used by pathologists, making it a promising strategy for human-interpretable WSI classification.
翻译:多示例学习(MIL)方法能够仅利用切片级标注实现千兆像素全切片图像(WSI)的分析。在高风险医疗领域中安全部署此类算法时,可解释性至关重要。传统MIL方法通过高亮显著区域提供解释,然而此类空间热图能为终端用户提供的洞察有限。为解决这一问题,我们提出一种新颖的、本质可解释的WSI分类方法,该方法利用人类可理解的病理学概念生成解释。我们提出的概念MIL模型借助视觉-语言模型的最新进展,直接基于图像特征预测病理学概念。模型的预测结果通过对WSI中前K个图像块所识别概念的线性组合获得,从而能够通过追溯每个概念对预测的影响实现内在解释。相较于传统的基于概念的可解释模型,我们的方法利用视觉-语言模型消除了对昂贵人工标注的需求。我们在两个广泛使用的病理数据集(Camelyon16和PANDA)上验证了所提方法。在两个数据集中,概念MIL模型的AUC与准确率均超过0.9,达到与最先进模型相当的水平。我们进一步发现,在排名前20的图像块中,分别有87.1%(Camelyon16)和85.3%(PANDA)位于肿瘤区域内。用户研究表明,我们模型识别的概念与病理学家使用的概念高度吻合,这使其成为实现人类可解释WSI分类的有效策略。