Tissue segmentation is a routine preprocessing step to reduce the computational cost of whole slide image (WSI) analysis by excluding background regions. Traditional image processing techniques are commonly used for tissue segmentation, but often require manual adjustments to parameter values for atypical cases, fail to exclude all slide and scanning artifacts from the background, and are unable to segment adipose tissue. Pen marking artifacts in particular can be a potential source of bias for subsequent analyses if not removed. In addition, several applications require the separation of individual cross-sections, which can be challenging due to tissue fragmentation and adjacent positioning. To address these problems, we develop a convolutional neural network for tissue and pen marking segmentation using a dataset of 200 H&E stained WSIs. For separating tissue cross-sections, we propose a novel post-processing method based on clustering predicted centroid locations of the cross-sections in a 2D histogram. On an independent test set, the model achieved a mean Dice score of 0.981$\pm$0.033 for tissue segmentation and a mean Dice score of 0.912$\pm$0.090 for pen marking segmentation. The mean absolute difference between the number of annotated and separated cross-sections was 0.075$\pm$0.350. Our results demonstrate that the proposed model can accurately segment H&E stained tissue cross-sections and pen markings in WSIs while being robust to many common slide and scanning artifacts. The model with trained model parameters and post-processing method are made publicly available as a Python package called SlideSegmenter.
翻译:组织分割是全切片图像分析中的常规预处理步骤,旨在通过排除背景区域降低计算成本。传统图像处理技术常用于组织分割,但常需针对非典型案例手动调整参数值,无法完全排除背景中的载玻片与扫描伪影,且不能分割脂肪组织。尤其值得注意的是,若未去除笔迹标记伪影,其可能成为后续分析结果的潜在偏倚来源。此外,部分应用场景需分离独立组织切片,而组织碎裂与相邻排布使该任务具有挑战性。为解决上述问题,我们基于200张H&E染色全切片图像数据集,开发了用于组织与笔迹标记分割的卷积神经网络。针对组织切片分离任务,我们提出了一种基于预测切片质心在二维直方图中聚类的新型后处理方法。在独立测试集上,模型的组织分割平均Dice系数达0.981±0.033,笔迹标记分割平均Dice系数达0.912±0.090。标注切片数量与分离切片数量的平均绝对差值为0.075±0.350。结果表明,所提模型能精准分割H&E染色组织切片与笔迹标记,并对多种常见载玻片与扫描伪影具有鲁棒性。该模型、训练参数及后处理方法已作为名为SlideSegmenter的Python工具包公开发布。