Purpose: We apply federated learning to train an OCT image classifier simulating a realistic scenario with multiple clients and statistical heterogeneous data distribution where data in the clients lack samples of some categories entirely. Methods: We investigate the effectiveness of FedAvg and FedProx to train an OCT image classification model in a decentralized fashion, addressing privacy concerns associated with centralizing data. We partitioned a publicly available OCT dataset across multiple clients under IID and Non-IID settings and conducted local training on the subsets for each client. We evaluated two federated learning methods, FedAvg and FedProx for these settings. Results: Our experiments on the dataset suggest that under IID settings, both methods perform on par with training on a central data pool. However, the performance of both algorithms declines as we increase the statistical heterogeneity across the client data, while FedProx consistently performs better than FedAvg in the increased heterogeneity settings. Conclusion: Despite the effectiveness of federated learning in the utilization of private data across multiple medical institutions, the large number of clients and heterogeneous distribution of labels deteriorate the performance of both algorithms. Notably, FedProx appears to be more robust to the increased heterogeneity.
翻译:目的:我们应用联邦学习训练光学相干断层扫描(OCT)图像分类器,模拟多客户端及统计异质性数据分布的真实场景,其中客户端数据完全缺失某些类别的样本。方法:我们探究FedAvg与FedProx在去中心化方式下训练OCT图像分类模型的有效性,以解决数据集中化相关的隐私问题。我们将公开的OCT数据集按独立同分布(IID)与非独立同分布(Non-IID)设置划分至多个客户端,并对每个客户端的子集进行局部训练。针对这些设置,我们评估了FedAvg与FedProx两种联邦学习方法。结果:在数据集上的实验表明,在IID设置下,两种方法表现均与在集中数据池上训练相当。然而,随着客户端数据统计异质性增强,两种算法的性能均下降,但在异质性增强的设置中,FedProx始终优于FedAvg。结论:尽管联邦学习在多医疗机构间利用私有数据方面具有有效性,但大量客户端及标签的异质性分布会削弱两种算法的性能。值得注意的是,FedProx对增强的异质性表现出更强的鲁棒性。