Computer-aided diagnosis (CAD) can help pathologists improve diagnostic accuracy together with consistency and repeatability for cancers. However, the CAD models trained with the histopathological images only from a single center (hospital) generally suffer from the generalization problem due to the straining inconsistencies among different centers. In this work, we propose a pseudo-data based self-supervised federated learning (FL) framework, named SSL-FT-BT, to improve both the diagnostic accuracy and generalization of CAD models. Specifically, the pseudo histopathological images are generated from each center, which contains inherent and specific properties corresponding to the real images in this center, but does not include the privacy information. These pseudo images are then shared in the central server for self-supervised learning (SSL). A multi-task SSL is then designed to fully learn both the center-specific information and common inherent representation according to the data characteristics. Moreover, a novel Barlow Twins based FL (FL-BT) algorithm is proposed to improve the local training for the CAD model in each center by conducting contrastive learning, which benefits the optimization of the global model in the FL procedure. The experimental results on three public histopathological image datasets indicate the effectiveness of the proposed SSL-FL-BT on both diagnostic accuracy and generalization.
翻译:计算机辅助诊断(CAD)能帮助病理学家提高癌症诊断的准确性、一致性和可重复性。然而,仅基于单一中心(医院)组织病理图像训练的CAD模型,通常会因不同中心间的染色不一致性而存在泛化问题。本文提出一种基于伪数据的自监督联邦学习(FL)框架——SSL-FT-BT,旨在提升CAD模型的诊断准确性与泛化能力。具体而言,每个中心生成含有其真实图像固有特性和特异性但不包含隐私信息的伪组织病理图像,随后将这些伪图像共享至中央服务器进行自监督学习(SSL)。进而设计多任务SSL机制,根据数据特征充分学习中心特异性信息与公共固有表征。此外,提出一种基于Barlow Twins的新型联邦学习(FL-BT)算法,通过对比学习优化各中心CAD模型的本地训练,从而促进联邦学习过程中全局模型的优化。在三个公开组织病理图像数据集上的实验结果表明,所提出的SSL-FL-BT框架在诊断准确性和泛化能力方面均具有有效性。