StainNet: Scaling Self-Supervised Foundation Models on Immunohistochemistry and Special Stains for Computational Pathology

Foundation models trained with self-supervised learning (SSL) on large-scale histological images have significantly accelerated the development of computational pathology. These models can serve as backbones for region-of-interest (ROI) image analysis or patch-level feature extractors in whole-slide images (WSIs) based on multiple instance learning (MIL). Existing pathology foundation models (PFMs) are typically pre-trained on Hematoxylin-Eosin (H\&E) stained pathology images. However, images such as immunohistochemistry (IHC) and special stains are also frequently used in clinical practice. PFMs pre-trained mainly on H\&E-stained images may be limited in clinical applications involving these non-H\&E images. To address this issue, we propose StainNet, a collection of self-supervised foundation models specifically trained for IHC and special stains in pathology images based on the vision transformer (ViT) architecture. StainNet contains a ViT-Small and a ViT-Base model, both of which are trained using a self-distillation SSL approach on over 1.4 million patch images extracted from 20,231 publicly available IHC and special staining WSIs in the HISTAI database. To evaluate StainNet models, we conduct experiments on three in-house slide-level IHC classification tasks, three in-house ROI-level special stain and two public ROI-level IHC classification tasks to demonstrate their strong ability. We also perform ablation studies such as few-ratio learning and retrieval evaluations, and compare StainNet models with recent larger PFMs to further highlight their strengths. The StainNet model weights are available at https://github.com/WonderLandxD/StainNet.

翻译：通过自监督学习在大规模组织学图像上训练的基础模型显著推动了计算病理学的发展。这些模型可作为基于多示例学习的全切片图像中感兴趣区域图像分析的骨干网络或图块级特征提取器。现有的病理学基础模型通常是在苏木精-伊红染色病理图像上进行预训练的。然而，免疫组织化学和特殊染色图像在临床实践中同样被广泛使用。主要基于H&E染色图像预训练的PFMs在处理涉及非H&E图像的临床应用时可能存在局限。为解决这一问题，我们提出StainNet——基于视觉Transformer架构、专门针对病理图像中IHC和特殊染色训练的自监督基础模型集合。StainNet包含ViT-Small和ViT-Base两个模型，均采用自蒸馏SSL方法在HISTAI数据库中20,231张公开IHC及特殊染色WSI提取的超过140万个图像块上进行训练。为评估StainNet模型，我们在三个内部切片级IHC分类任务、三个内部ROI级特殊染色及两个公开ROI级IHC分类任务上开展实验，验证其强大性能。同时通过小样本学习、检索评估等消融实验，并将StainNet与近期更大规模的PFMs进行对比，进一步凸显其优势。StainNet模型权重已发布于https://github.com/WonderLandxD/StainNet。