Digitized histopathology glass slides, known as Whole Slide Images (WSIs), are often several gigapixels large and contain sensitive metadata information, which makes distributed processing unfeasible. Moreover, artifacts in WSIs may result in unreliable predictions when directly applied by Deep Learning (DL) algorithms. Therefore, preprocessing WSIs is beneficial, e.g., eliminating privacy-sensitive information, splitting a gigapixel medical image into tiles, and removing the diagnostically irrelevant areas. This work proposes a cloud service to parallelize the preprocessing pipeline for large medical images. The data and model parallelization will not only boost the end-to-end processing efficiency for histological tasks but also secure the reconstruction of WSI by randomly distributing tiles across processing nodes. Furthermore, the initial steps of the pipeline will be integrated into the Jupyter-based Virtual Research Environment (VRE) to enable image owners to configure and automate the execution process based on resource allocation.
翻译:数字化组织病理学载玻片(即全切片图像,Whole Slide Images, WSIs)通常达到数吉像素量级,且包含敏感元数据信息,导致分布式处理难以实现。此外,当深度学习算法直接应用于WSIs时,其伪影可能导致不可靠的预测。因此,对WSIs进行预处理具有重要意义,例如消除隐私敏感信息、将吉像素级医学图像分割为切片图块,以及去除诊断无关区域。本研究提出一种云服务,用于并行化大型医学图像的预处理管线。数据和模型并行化不仅能提升组织学任务的端到端处理效率,还能通过随机分配切片图块至处理节点确保WSI重建的安全性。此外,该管线的初始步骤将被集成至基于Jupyter的虚拟研究环境(Virtual Research Environment, VRE),使图像所有者能够根据资源分配配置并自动化执行流程。