FedER: Federated Learning through Experience Replay and Privacy-Preserving Data Synthesis

In the medical field, multi-center collaborations are often sought to yield more generalizable findings by leveraging the heterogeneity of patient and clinical data. However, recent privacy regulations hinder the possibility to share data, and consequently, to come up with machine learning-based solutions that support diagnosis and prognosis. Federated learning (FL) aims at sidestepping this limitation by bringing AI-based solutions to data owners and only sharing local AI models, or parts thereof, that need then to be aggregated. However, most of the existing federated learning solutions are still at their infancy and show several shortcomings, from the lack of a reliable and effective aggregation scheme able to retain the knowledge learned locally to weak privacy preservation as real data may be reconstructed from model updates. Furthermore, the majority of these approaches, especially those dealing with medical data, relies on a centralized distributed learning strategy that poses robustness, scalability and trust issues. In this paper we present a federated and decentralized learning strategy, FedER, that, exploiting experience replay and generative adversarial concepts, effectively integrates features from local nodes, providing models able to generalize across multiple datasets while maintaining privacy. FedER is tested on two tasks -- tuberculosis and melanoma classification -- using multiple datasets in order to simulate realistic non-i.i.d. medical data scenarios. Results show that our approach achieves performance comparable to standard (non-federated) learning and significantly outperforms state-of-the-art federated methods in their centralized (thus, more favourable) formulation. Code is available at https://github.com/perceivelab/FedER

翻译：在医学领域，多中心协作常被用于利用患者和临床数据的异质性，以获取更具普适性的研究结论。然而，近期出台的隐私法规阻碍了数据共享的可能性，进而限制了基于机器学习的诊断与预后解决方案的开发。联邦学习旨在通过将人工智能解决方案部署至数据所有者端，仅共享本地AI模型或其部分参数以供聚合，从而规避这一限制。然而，现有联邦学习方案大多仍处于初步阶段，存在诸多不足：缺乏能够保留本地学习知识的可靠高效聚合方案，以及由于模型更新可能重构真实数据而导致的隐私保护薄弱。此外，多数方法（尤其是处理医学数据的方法）依赖集中式分布式学习策略，这带来了鲁棒性、可扩展性和信任问题。本文提出一种联邦式去中心化学习策略FedER，该策略利用经验重放与生成对抗概念，有效整合来自本地节点的特征，在保持隐私性的同时提供能跨多个数据集泛化的模型。为模拟真实医学数据中非独立同分布的场景，我们在结核病与黑色素瘤分类两项任务上使用多个数据集对FedER进行测试。结果表明，该方法达到了与传统（非联邦）学习相当的性能，并且显著优于采用集中式（更有利）构型的最新联邦学习方法。代码开源地址：https://github.com/perceivelab/FedER