Stain normalisation is thought to be a crucial preprocessing step in computational pathology pipelines. We question this belief in the context of weakly supervised whole slide image classification, motivated by the emergence of powerful feature extractors trained using self-supervised learning on diverse pathology datasets. To this end, we performed the most comprehensive evaluation of publicly available pathology feature extractors to date, involving more than 8,000 training runs across nine tasks, five datasets, three downstream architectures, and various preprocessing setups. Notably, we find that omitting stain normalisation and image augmentations does not compromise downstream slide-level classification performance, while incurring substantial savings in memory and compute. Using a new evaluation metric that facilitates relative downstream performance comparison, we identify the best publicly available extractors, and show that their latent spaces are remarkably robust to variations in stain and augmentations like rotation. Contrary to previous patch-level benchmarking studies, our approach emphasises clinical relevance by focusing on slide-level biomarker prediction tasks in a weakly supervised setting with external validation cohorts. Our findings stand to streamline digital pathology workflows by minimising preprocessing needs and informing the selection of feature extractors. Code and data are available at https://georg.woelflein.eu/good-features.
翻译:染色标准化通常被认为是计算病理学流程中至关重要的预处理步骤。我们基于自我监督学习在多样化病理数据集上训练的强特征提取器,在弱监督全切片图像分类的背景下质疑了这一观点。为此,我们进行了迄今为止最全面的公开病理特征提取器评估,涉及超过8,000次训练运行,涵盖九项任务、五个数据集、三种下游架构以及多种预处理设置。值得注意的是,我们发现省略染色标准化和图像增强不会影响下游切片级分类性能,同时显著节省内存和计算资源。利用能够促进下游性能相对比较的新评估指标,我们确定了最佳的公开特征提取器,并证明其潜在空间对染色变化和旋转等增强处理具有出色的鲁棒性。与以往的补丁级基准研究不同,我们的方法通过聚焦于具有外部验证队列的弱监督设置中的切片级生物标志物预测任务,突出了临床相关性。我们的研究成果有望通过最小化预处理需求并为特征提取器选择提供依据,从而简化数字病理学工作流程。代码和数据可访问:https://georg.woelflein.eu/good-features。