Breast cancer is the most prevalent cancer in women worldwide. Histopathology image analysis serves as the gold standard for cancer diagnosis. In this regard, whole-slide imaging (WSI), a revolutionary technology in digital pathology, allows for ultrahigh-resolution tissue analysis. Despite its promise, WSI analysis faces significant computational challenges due to its massive data size and tissue heterogeneity. To address this issue, we present a Gaussian mixture based multiple instance learning (MIL) framework for WSI analysis with partially subsampled instances. Our approach models a WSI as a bag of instances (i.e., randomly cropped sub-images), leveraging a bag-based maximum likelihood estimator (BMLE) to predict metastases. Furthermore, we introduce a subsampling-based maximum likelihood estimator (SMLE) to refine predictions by selectively labeling a subset of instances. Extensive evaluations of the breast carcinoma metastasis prediction demonstrate that BMLE surpasses state-of-the-art methods, while the SMLE further improves the prediction accuracy at both bag and instance levels. We find that our method is fairly robust against various plausible model mis-specifications. Theoretical analyses and simulation studies validate the performance and robustness of our methods.
翻译:乳腺癌是全球女性中最常见的癌症。组织病理学图像分析是癌症诊断的金标准。其中,全切片成像(whole-slide imaging, WSI)作为数字病理学中的革命性技术,能够实现超高分辨率的组织分析。尽管前景广阔,但由于其巨大的数据量和组织异质性,WSI分析面临着显著的计算挑战。针对这一问题,我们提出了一种基于高斯混合的多实例学习(multiple instance learning, MIL)框架,用于对部分子采样实例进行WSI分析。我们的方法将WSI建模为一个实例包(即随机裁剪的子图像),利用基于包的极大似然估计器(bag-based maximum likelihood estimator, BMLE)预测转移。此外,我们引入了一种基于子采样的极大似然估计器(subsampling-based maximum likelihood estimator, SMLE),通过选择性标注部分实例来优化预测。在乳腺癌转移预测的广泛评估中,BMLE超越了现有最先进方法,而SMLE在包级和实例级进一步提升了预测精度。我们发现,我们的方法对各种合理的模型误设具有较好的鲁棒性。理论分析和模拟研究验证了我们方法的性能与鲁棒性。