Machine learning in computational pathology (CPath) often aggregates patch-level predictions from multi-gigapixel Whole Slide Images (WSIs) to generate WSI-level prediction scores for crucial tasks such as survival prediction and drug effect prediction. However, current methods do not explicitly characterize distributional differences between patch sets within WSIs. We introduce HistoKernel, a novel Maximum Mean Discrepancy (MMD) kernel that measures distributional similarity between WSIs for enhanced prediction performance on downstream prediction tasks. Our comprehensive analysis demonstrates HistoKernel's effectiveness across various machine learning tasks, including retrieval (n = 9,362), drug sensitivity regression (n = 551), point mutation classification (n = 3,419), and survival analysis (n = 2,291), outperforming existing deep learning methods. Additionally, HistoKernel seamlessly integrates multi-modal data and offers a novel perturbation-based method for patch-level explainability. This work pioneers the use of kernel-based methods for WSI-level predictive modeling, opening new avenues for research. Code is available at https://github.com/pkeller00/HistoKernel.
翻译:计算病理学中的机器学习通常通过聚合来自数十亿像素全切片图像的斑块级预测,以生成用于生存预测和药物效应预测等关键任务的WSI级预测分数。然而,现有方法未能明确表征WSI内部斑块集之间的分布差异。本文提出HistoKernel,一种新颖的最大均值差异核,用于度量WSI之间的分布相似性,以提升下游预测任务的性能。我们的综合分析表明,HistoKernel在多种机器学习任务中均表现出色,包括检索(n = 9,362)、药物敏感性回归(n = 551)、点突变分类(n = 3,419)和生存分析(n = 2,291),其性能优于现有的深度学习方法。此外,HistoKernel能够无缝整合多模态数据,并提供一种基于扰动的新型斑块级可解释性方法。本研究开创了基于核方法的WSI级预测建模,为相关领域开辟了新的研究方向。代码发布于https://github.com/pkeller00/HistoKernel。