Objective: Reproducibility is critical for translating machine learning-based (ML) solutions in computational pathology (CompPath) into practice. However, an increasing number of studies report difficulties in reproducing ML results. The NCI Imaging Data Commons (IDC) is a public repository of >120 cancer image collections, including >38,000 whole-slide images (WSIs), that is designed to be used with cloud-based ML services. Here, we explore the potential of the IDC to facilitate reproducibility of CompPath research. Materials and Methods: The IDC realizes the FAIR principles: All images are encoded according to the DICOM standard, persistently identified, discoverable via rich metadata, and accessible via open tools. Taking advantage of this, we implemented two experiments in which a representative ML-based method for classifying lung tumor tissue was trained and/or evaluated on different datasets from the IDC. To assess reproducibility, the experiments were run multiple times with independent but identically configured sessions of common ML services. Results: The AUC values of different runs of the same experiment were generally consistent and in the same order of magnitude as a similar, previously published study. However, there were occasional small variations in AUC values of up to 0.044, indicating a practical limit to reproducibility. Discussion and conclusion: By realizing the FAIR principles, the IDC enables other researchers to reuse exactly the same datasets. Cloud-based ML services enable others to run CompPath experiments in an identically configured computing environment without having to own high-performance hardware. The combination of both makes it possible to approach the reproducibility limit.
翻译:目的:可重复性是将基于机器学习(ML)的计算病理学(CompPath)解决方案转化为临床实践的关键。然而,越来越多的研究报告称难以复现ML结果。NCI成像数据共享平台(IDC)是一个包含120余个癌症影像数据集(涵盖超过38,000张全切片图像(WSI))的公共存储库,专为与云ML服务协同使用而设计。本文旨在探索IDC促进CompPath研究可重复性的潜力。材料与方法:IDC遵循FAIR原则:所有图像均按DICOM标准编码,具有持久标识符,可通过丰富元数据发现,并可通过开放工具访问。基于此,我们设计了两组实验,采用代表性ML方法在IDC的不同数据集上训练和/或评估肺肿瘤组织分类任务。为评估可重复性,我们在独立但配置相同的常见云ML服务环境中多次重复运行实验。结果:同一实验不同运行轮次的AUC值总体一致,并与类似既往研究结果处于同一量级。但偶见AUC值存在最高0.044的微小波动,表明可重复性存在实际限制。讨论与结论:通过践行FAIR原则,IDC使其他研究者能够精确复用相同数据集;基于云的ML服务则允许其他人员在配置一致的计算环境中运行CompPath实验,无需自持高性能硬件。二者的结合使得接近可重复性极限成为可能。