Deep learning models have shown immense promise in computational pathology (CPath) tasks, but their performance often suffers when applied to unseen data due to domain shifts. Addressing this requires domain generalization (DG) algorithms. However, a systematic evaluation of DG algorithms in the CPath context is lacking. This study aims to benchmark the effectiveness of 30 DG algorithms on 3 CPath tasks of varying difficulty through 7,560 cross-validation runs. We evaluate these algorithms using a unified and robust platform, incorporating modality-specific techniques and recent advances like pretrained foundation models. Our extensive cross-validation experiments provide insights into the relative performance of various DG strategies. We observe that self-supervised learning and stain augmentation consistently outperform other methods, highlighting the potential of pretrained models and data augmentation. Furthermore, we introduce a new pan-cancer tumor detection dataset (HISTOPANTUM) as a benchmark for future research. This study offers valuable guidance to researchers in selecting appropriate DG approaches for CPath tasks.
翻译:深度学习模型在计算病理学任务中展现出巨大潜力,但由于领域偏移的存在,其在未见数据上的性能往往显著下降。解决这一问题需要领域泛化算法。然而,目前缺乏在计算病理学背景下对领域泛化算法的系统性评估。本研究旨在通过7,560次交叉验证实验,在3个不同难度的计算病理学任务上对30种领域泛化算法的有效性进行基准测试。我们采用统一且鲁棒的评估平台,整合了模态特定技术以及预训练基础模型等最新进展。我们的大规模交叉验证实验揭示了各种领域泛化策略的相对性能。研究发现,自监督学习和染色增强技术持续优于其他方法,凸显了预训练模型与数据增强的潜力。此外,我们引入了一个新的泛癌肿瘤检测数据集作为未来研究的基准。本研究为研究人员在选择适用于计算病理学任务的领域泛化方法提供了重要指导。