Nuclei detection and segmentation in hematoxylin and eosin-stained (H&E) tissue images are important clinical tasks and crucial for a wide range of applications. However, it is a challenging task due to nuclei variances in staining and size, overlapping boundaries, and nuclei clustering. While convolutional neural networks have been extensively used for this task, we explore the potential of Transformer-based networks in this domain. Therefore, we introduce a new method for automated instance segmentation of cell nuclei in digitized tissue samples using a deep learning architecture based on Vision Transformer called CellViT. CellViT is trained and evaluated on the PanNuke dataset, which is one of the most challenging nuclei instance segmentation datasets, consisting of nearly 200,000 annotated Nuclei into 5 clinically important classes in 19 tissue types. We demonstrate the superiority of large-scale in-domain and out-of-domain pre-trained Vision Transformers by leveraging the recently published Segment Anything Model and a ViT-encoder pre-trained on 104 million histological image patches - achieving state-of-the-art nuclei detection and instance segmentation performance on the PanNuke dataset with a mean panoptic quality of 0.51 and an F1-detection score of 0.83. The code is publicly available at https://github.com/TIO-IKIM/CellViT
翻译:在苏木精-伊红染色组织图像中实现细胞核检测与分割是一项重要的临床任务,对广泛应用具有关键意义。然而,由于染色差异、细胞核尺寸变化、边界重叠以及细胞核聚集等因素,该任务极具挑战性。尽管卷积神经网络已广泛用于此类问题,本文探索了基于Transformer架构在该领域的潜力。为此,我们提出了一种名为CellViT的新方法,该方法基于Vision Transformer深度学习架构,用于数字化组织样本中细胞核的自动实例分割。CellViT在PanNuke数据集上进行训练与评估——该数据集是最具挑战性的细胞核实例分割数据集之一,包含来自19种组织类型的近20万个标注细胞核,涵盖5个临床重要类别。通过利用近期发布的Segment Anything模型以及在1.04亿张组织学图像块上预训练的ViT编码器,我们证明了大规模领域内与领域外预训练Vision Transformer的优越性,在PanNuke数据集上实现了当前最优的细胞核检测与实例分割性能:平均全景质量达0.51,F1检测得分为0.83。相关代码已在https://github.com/TIO-IKIM/CellViT 公开。