Digital pathology has significantly advanced disease detection and pathologist efficiency through the analysis of gigapixel whole-slide images (WSI). In this process, WSIs are first divided into patches, for which a feature extractor model is applied to obtain feature vectors, which are subsequently processed by an aggregation model to predict the respective WSI label. With the rapid evolution of representation learning, numerous new feature extractor models, often termed foundational models, have emerged. Traditional evaluation methods, however, rely on fixed aggregation model hyperparameters, a framework we identify as potentially biasing the results. Our study uncovers a co-dependence between feature extractor models and aggregation model hyperparameters, indicating that performance comparability can be skewed based on the chosen hyperparameters. By accounting for this co-dependency, we find that the performance of many current feature extractor models is notably similar. We support this insight by evaluating seven feature extractor models across three different datasets with 162 different aggregation model configurations. This comprehensive approach provides a more nuanced understanding of the relationship between feature extractors and aggregation models, leading to a fairer and more accurate assessment of feature extractor models in digital pathology.
翻译:数字病理学通过分析千兆像素的全切片图像(Whole-Slide Image, WSI)显著提升了疾病检测效率和病理学家的工作质量。在该流程中,WSI首先被分割为图像块(patches),随后通过特征提取器模型获取其特征向量,最后经由聚合模型处理以预测对应的WSI标签。随着表示学习的快速发展,大量新型特征提取器模型(常被称为基础模型)相继涌现。然而,传统评估方法依赖于固定的聚合模型超参数,我们发现这种评估框架可能导致结果偏差。本研究揭示了特征提取器模型与聚合模型超参数之间的相互依赖关系,表明模型性能的可比性会因所选超参数的不同而产生偏移。通过考虑这种相互依赖性,我们发现当前多数特征提取器模型的性能实际上高度相似。为验证这一结论,我们在三个不同数据集上,采用162种聚合模型配置,对七种特征提取器模型进行了系统性评估。这种全面分析方法为理解特征提取器与聚合模型之间的关系提供了更细致的视角,从而实现对数字病理学中特征提取器模型更公平、更准确的评估。