Digitized histopathology glass slides, known as Whole Slide Images (WSIs), are often several gigapixels large and contain sensitive metadata information, which makes distributed processing unfeasible. Moreover, artifacts in WSIs may result in unreliable predictions when directly applied by Deep Learning (DL) algorithms. Therefore, preprocessing WSIs is beneficial, e.g., eliminating privacy-sensitive information, splitting a gigapixel medical image into tiles, and removing the diagnostically irrelevant areas. This work proposes a cloud service to parallelize the preprocessing pipeline for large medical images. The data and model parallelization will not only boost the end-to-end processing efficiency for histological tasks but also secure the reconstruction of WSI by randomly distributing tiles across processing nodes. Furthermore, the initial steps of the pipeline will be integrated into the Jupyter-based Virtual Research Environment (VRE) to enable image owners to configure and automate the execution process based on resource allocation.
翻译:数字化病理组织玻片(即全切片图像,WSIs)通常具有数吉像素的尺寸并包含敏感的元数据信息,这使得分布式处理难以实现。此外,当深度学习(DL)算法直接应用于WSIs时,其中的伪影可能导致不可靠的预测结果。因此,对WSIs进行预处理具有重要价值,例如消除隐私敏感信息、将吉像素级医学图像分割为图块以及移除诊断无关区域。本研究提出了一种用于大规模医学图像预处理流水线并行化的云服务方案。数据与模型并行化不仅能提升组织学任务的端到端处理效率,还能通过将图块随机分布至处理节点来保障WSI重构的安全性。此外,流水线的初始步骤将集成至基于Jupyter的虚拟研究环境(VRE),使图像所有者能够根据资源分配配置并自动化执行流程。