Annotating medical imaging datasets is costly, so fine-tuning (or transfer learning) is the most effective method for digital pathology vision applications such as disease classification and semantic segmentation. However, due to texture bias in models trained on real-world images, transfer learning for histopathology applications might result in underperforming models, which necessitates the need for using unlabeled histopathology data and self-supervised methods to discover domain-specific characteristics. Here, we tested the premise that histopathology-specific pretrained models provide better initializations for pathology vision tasks, i.e., gland and cell segmentation. In this study, we compare the performance of gland and cell segmentation tasks with domain-specific and non-domain-specific pretrained weights. Moreover, we investigate the data size at which domain-specific pretraining produces a statistically significant difference in performance. In addition, we investigated whether domain-specific initialization improves the effectiveness of out-of-domain testing on distinct datasets but the same task. The results indicate that performance gain using domain-specific pretraining depends on both the task and the size of the training dataset. In instances with limited dataset sizes, a significant improvement in gland segmentation performance was also observed, whereas models trained on cell segmentation datasets exhibit no improvement.
翻译:标注医学影像数据集成本高昂,因此微调(或迁移学习)是数字病理学视觉应用(如疾病分类和语义分割)最有效的方法。然而,由于在真实世界图像上训练的模型存在纹理偏差,将迁移学习应用于组织病理学可能导致模型性能不佳,这就需要利用未标注的组织病理学数据和自监督方法来发现领域特异性特征。在此,我们检验了这样的前提:基于组织病理学特定领域的预训练模型能为病理学视觉任务(即腺体与细胞分割)提供更好的初始化。在本研究中,我们比较了使用领域特定与非领域特定预训练权重的腺体和细胞分割任务的性能。此外,我们探究了领域特定预训练产生统计显著性性能差异所需的数据量。同时,我们还研究了领域特定初始化是否能提升在不同数据集但相同任务上的域外测试效果。结果表明,使用领域特定预训练所获得的性能提升取决于具体任务和训练数据集的大小。在数据集规模有限的情况下,腺体分割性能有显著提升,而基于细胞分割数据集训练的模型则未观察到改进。