Finding Regions of Interest in Whole Slide Images Using Multiple Instance Learning

Whole Slide Images (WSI), obtained by high-resolution digital scanning of microscope slides at multiple scales, are the cornerstone of modern Digital Pathology. However, they represent a particular challenge to AI-based/AI-mediated analysis because pathology labeling is typically done at slide-level, instead of tile-level. It is not just that medical diagnostics is recorded at the specimen level, the detection of oncogene mutation is also experimentally obtained, and recorded by initiatives like The Cancer Genome Atlas (TCGA), at the slide level. This configures a dual challenge: a) accurately predicting the overall cancer phenotype and b) finding out what cellular morphologies are associated with it at the tile level. To address these challenges, a weakly supervised Multiple Instance Learning (MIL) approach was explored for two prevalent cancer types, Invasive Breast Carcinoma (TCGA-BRCA) and Lung Squamous Cell Carcinoma (TCGA-LUSC). This approach was explored for tumor detection at low magnification levels and TP53 mutations at various levels. Our results show that a novel additive implementation of MIL matched the performance of reference implementation (AUC 0.96), and was only slightly outperformed by Attention MIL (AUC 0.97). More interestingly from the perspective of the molecular pathologist, these different AI architectures identify distinct sensitivities to morphological features (through the detection of Regions of Interest, RoI) at different amplification levels. Tellingly, TP53 mutation was most sensitive to features at the higher applications where cellular morphology is resolved.

翻译：全切片图像（WSI）是通过对显微镜玻片进行多尺度高分辨率数字扫描获得的，是现代数字病理学的基石。然而，它们对基于人工智能/人工智能介导的分析提出了特殊挑战，因为病理标注通常在切片级别而非图像块级别进行。这不仅是因为医学诊断是在标本层面记录的，而且癌基因突变的检测也是通过实验获得，并由癌症基因组图谱（TCGA）等计划在切片层面记录。这构成了双重挑战：a）准确预测整体癌症表型，以及b）在图像块层面发现与之相关的细胞形态学特征。为应对这些挑战，本研究针对两种常见癌症类型——浸润性乳腺导管癌（TCGA-BRCA）和肺鳞状细胞癌（TCGA-LUSC），探索了一种弱监督的多实例学习（MIL）方法。该方法在低放大倍数下用于肿瘤检测，并在多个放大倍数下用于TP53突变检测。我们的结果表明，一种新颖的加法实现MIL方法在性能上与参考实现相当（AUC 0.96），仅略逊于注意力MIL方法（AUC 0.97）。从分子病理学家的角度来看，更有趣的是，这些不同的人工智能架构在不同放大倍数下对形态学特征（通过感兴趣区域RoI的检测）展现出不同的敏感性。值得注意的是，TP53突变对细胞形态学可分辨的高倍图像特征最为敏感。