Federated learning enables data sharing in healthcare contexts where it might otherwise be difficult due to data-use-ordinances or security and communication constraints. Distributed and shared data models allow models to become generalizable and learn from heterogeneous clients. While addressing data security, privacy, and vulnerability considerations, data itself is not shared across nodes in a given learning network. On the other hand, FL models often struggle with variable client data distributions and operate on an assumption of independent and identically distributed data. As the field has grown, the notion of fairness-aware federated learning mechanisms has also been introduced and is of distinct significance to the healthcare domain where many sensitive groups and protected classes exist. In this paper, we create a benchmark methodology for FAFL mechanisms under various heterogeneous conditions on datasets in the healthcare domain typically outside the scope of current federated learning benchmarks, such as medical imaging and waveform data formats. Our results indicate considerable variation in how various FAFL schemes respond to high levels of data heterogeneity. Additionally, doing so under privacy-preserving conditions can create significant increases in network communication cost and latency compared to the typical federated learning scheme.
翻译:联邦学习使得在医疗情境中难以因数据使用条例或安全及通信约束而共享的数据得以流通。分布式共享数据模型使模型具备泛化能力,并能从异构客户中学习。在解决数据安全、隐私和脆弱性问题的同时,数据本身不在给定学习网络的节点间共享。然而,联邦学习模型常面临客户端数据分布不均的问题,并基于独立同分布数据的假设运行。随着该领域的发展,公平感知联邦学习机制的概念也被引入,这对存在众多敏感群体和受保护类别的医疗领域具有显著意义。本文针对医疗领域通常超出当前联邦学习基准范围的数据集(如医学影像和波形数据格式),在不同异构条件下创建了公平感知联邦学习机制的基准评估方法。结果表明,不同公平感知联邦学习方案对高度数据异构性的响应存在显著差异。此外,在隐私保护条件下执行此操作,相较于典型联邦学习方案,可能导致网络通信成本和延迟显著增加。