We developed a software pipeline for quality control (QC) of histopathology whole slide images (WSIs) that segments various regions, such as blurs of different levels, tissue regions, tissue folds, and pen marks. Given the necessity and increasing availability of GPUs for processing WSIs, the proposed pipeline comprises multiple lightweight deep learning models to strike a balance between accuracy and speed. The pipeline was evaluated in all TCGAs, which is the largest publicly available WSI dataset containing more than 11,000 histopathological images from 28 organs. It was compared to a previous work, which was not based on deep learning, and it showed consistent improvement in segmentation results across organs. To minimize annotation effort for tissue and blur segmentation, annotated images were automatically prepared by mosaicking patches (sub-images) from various WSIs whose labels were identified using a patch classification tool HistoROI. Due to the generality of our trained QC pipeline and its extensive testing the potential impact of this work is broad. It can be used for automated pre-processing any WSI cohort to enhance the accuracy and reliability of large-scale histopathology image analysis for both research and clinical use. We have made the trained models, training scripts, training data, and inference results publicly available at https://github.com/abhijeetptl5/wsisegqc, which should enable the research community to use the pipeline right out of the box or further customize it to new datasets and applications in the future.
翻译:我们开发了一套用于组织病理学全切片图像质量控制的软件流程,该流程能够分割多种区域,包括不同级别的模糊区域、组织区域、组织褶皱以及笔迹标记。鉴于GPU在处理全切片图像方面的必要性与日益普及,所提出的流程包含多个轻量级深度学习模型,以在准确性与速度之间取得平衡。该流程在全部TCGA数据集中进行了评估,这是最大的公开可用全切片图像数据集,包含来自28个器官的超过11,000张组织病理学图像。与先前非基于深度学习的工作相比,本方法在跨器官分割结果上显示出持续改进。为最小化组织和模糊分割的标注工作量,我们通过拼接来自不同全切片图像的图块(子图像)自动制备标注图像,这些图像的标签使用图块分类工具HistoROI进行识别。由于我们训练的质量控制流程具有普适性且经过广泛测试,本工作的潜在影响广泛。它可用于自动化预处理任何全切片图像队列,以提高大规模组织病理学图像分析在研究和临床应用中的准确性与可靠性。我们已将训练模型、训练脚本、训练数据及推理结果公开于https://github.com/abhijeetptl5/wsisegqc,这将使研究社区能够直接使用该流程,或未来进一步将其定制应用于新数据集和新应用场景。