The NCI Imaging Data Commons as a platform for reproducible research in computational pathology

Daniela P. Schacherer,Markus D. Herrmann,David A. Clunie,Henning Höfener,William Clifford,William J. R. Longabaugh,Steve Pieper,Ron Kikinis,Andrey Fedorov,André Homeyer

from arxiv, 13 pages, 5 figures; improved manuscript, new experiments with P100 GPU

Background and Objectives: Reproducibility is a major challenge in developing machine learning (ML)-based solutions in computational pathology (CompPath). The NCI Imaging Data Commons (IDC) provides >120 cancer image collections according to the FAIR principles and is designed to be used with cloud ML services. Here, we explore its potential to facilitate reproducibility in CompPath research. Methods: Using the IDC, we implemented two experiments in which a representative ML-based method for classifying lung tumor tissue was trained and/or evaluated on different datasets. To assess reproducibility, the experiments were run multiple times with separate but identically configured instances of common ML services. Results: The AUC values of different runs of the same experiment were generally consistent. However, we observed small variations in AUC values of up to 0.045, indicating a practical limit to reproducibility. Conclusions: We conclude that the IDC facilitates approaching the reproducibility limit of CompPath research (i) by enabling researchers to reuse exactly the same datasets and (ii) by integrating with cloud ML services so that experiments can be run in identically configured computing environments.

翻译：背景与目的：可重复性是计算病理学中开发基于机器学习解决方案的主要挑战。NCI影像数据共享库依据FAIR原则提供120余组癌症影像数据集，并设计为可与云机器学习服务协同使用。本研究旨在探索该平台促进计算病理学研究可重复性的潜力。方法：通过IDC实施两项实验，分别在不同数据集上训练和/或评估具有代表性的基于机器学习的肺肿瘤组织分类方法。为评估可重复性，实验在通用机器学习服务的独立但配置完全相同的实例上重复运行。结果：相同实验的多次运行AUC值整体一致，但观察到最大0.045的微小差异，表明存在实际可重复性极限。结论：我们认为IDC通过允许研究者重复使用完全相同的数据集，以及与云机器学习服务的集成，使实验可在完全相同的计算环境中运行，从而推动计算病理学研究接近可重复性极限。

相关内容

IDC

关注 6

Interaction Design and Children是研究人员、教育工作者和实践者的首次国际会议，旨在分享包容性儿童中心设计、学习和互动领域的最新研究成果、创新方法和新技术。年会包括论文、专题介绍、发言者、讲习班、参与性设计经验以及讨论如何为儿童创造更好的互动经验。官网链接：http://idc.acm.org/2019/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日