Histopathological tissue classification is a fundamental task in computational pathology. Deep learning-based models have achieved superior performance but centralized training with data centralization suffers from the privacy leakage problem. Federated learning (FL) can safeguard privacy by keeping training samples locally, but existing FL-based frameworks require a large number of well-annotated training samples and numerous rounds of communication which hinder their practicability in the real-world clinical scenario. In this paper, we propose a universal and lightweight federated learning framework, named Federated Deep-Broad Learning (FedDBL), to achieve superior classification performance with limited training samples and only one-round communication. By simply associating a pre-trained deep learning feature extractor, a fast and lightweight broad learning inference system and a classical federated aggregation approach, FedDBL can dramatically reduce data dependency and improve communication efficiency. Five-fold cross-validation demonstrates that FedDBL greatly outperforms the competitors with only one-round communication and limited training samples, while it even achieves comparable performance with the ones under multiple-round communications. Furthermore, due to the lightweight design and one-round communication, FedDBL reduces the communication burden from 4.6GB to only 276.5KB per client using the ResNet-50 backbone at 50-round training. Since no data or deep model sharing across different clients, the privacy issue is well-solved and the model security is guaranteed with no model inversion attack risk. Code is available at https://github.com/tianpeng-deng/FedDBL.
翻译:组织病理学组织分类是计算病理学中的基础任务。基于深度学习的模型已取得优异性能,但数据集中式训练的集中化模式存在隐私泄露问题。联邦学习(FL)通过将训练样本保留在本地来保护隐私,然而现有基于FL的框架需要大量充分标注的训练样本和多轮通信,这限制了其在真实临床场景中的实用性。本文提出一种通用且轻量级的联邦学习框架——联邦深度-宽度学习(FedDBL),旨在通过有限训练样本和单轮通信实现优越的分类性能。通过简单关联预训练深度学习特征提取器、快速轻量的宽度学习推理系统和经典联邦聚合方法,FedDBL可显著降低数据依赖性并提升通信效率。五折交叉验证表明,FedDBL在仅需单轮通信和有限训练样本的情况下大幅优于对比方法,甚至能与多轮通信方法达到相当的性能。此外,由于其轻量化设计和单轮通信特性,采用ResNet-50骨干网络进行50轮训练时,每个客户端的通信负担从4.6GB降至仅276.5KB。由于不同客户端间不共享数据或深度模型,隐私问题得以妥善解决,且模型安全性得到保障,无模型反演攻击风险。代码见https://github.com/tianpeng-deng/FedDBL。