For healthcare datasets, it is often not possible to combine data samples from multiple sites due to ethical, privacy or logistical concerns. Federated learning allows for the utilisation of powerful machine learning algorithms without requiring the pooling of data. Healthcare data has many simultaneous challenges which require new methodologies to address, such as highly-siloed data, class imbalance, missing data, distribution shifts and non-standardised variables. Federated learning adds significant methodological complexity to conventional centralised machine learning, requiring distributed optimisation, communication between nodes, aggregation of models and redistribution of models. In this systematic review, we consider all papers on Scopus that were published between January 2015 and February 2023 and which describe new federated learning methodologies for addressing challenges with healthcare data. We performed a detailed review of the 89 papers which fulfilled these criteria. Significant systemic issues were identified throughout the literature which compromise the methodologies in many of the papers reviewed. We give detailed recommendations to help improve the quality of the methodology development for federated learning in healthcare.
翻译:对于医疗健康数据集,出于伦理、隐私或后勤方面的考量,通常无法整合来自多个站点的数据样本。联邦学习能够在无需汇总数据的前提下,利用强大的机器学习算法。医疗健康数据面临诸多并行挑战,例如高度隔离的数据、类别不平衡、数据缺失、分布偏移以及非标准化变量,这些都需要新的方法论来应对。相较于传统的集中式机器学习,联邦学习大大增加了方法学的复杂性,需要分布式优化、节点间通信、模型聚合与模型再分配。在本系统性综述中,我们考察了Scopus数据库中2015年1月至2023年2月期间发表的所有论文,这些论文描述了针对医疗健康数据挑战的新联邦学习方法。我们对符合这些标准的89篇论文进行了详细评审。研究发现在文献中普遍存在显著的系统性问题,损害了许多被评论文本的方法学质量。我们提出了详细建议,以帮助提升医疗健康领域联邦学习方法学开发的整体质量。