Federated learning (FL) aided health diagnostic models can incorporate data from a large number of personal edge devices (e.g., mobile phones) while keeping the data local to the originating devices, largely ensuring privacy. However, such a cross-device FL approach for health diagnostics still imposes many challenges due to both local data imbalance (as extreme as local data consists of a single disease class) and global data imbalance (the disease prevalence is generally low in a population). Since the federated server has no access to data distribution information, it is not trivial to solve the imbalance issue towards an unbiased model. In this paper, we propose FedLoss, a novel cross-device FL framework for health diagnostics. Here the federated server averages the models trained on edge devices according to the predictive loss on the local data, rather than using only the number of samples as weights. As the predictive loss better quantifies the data distribution at a device, FedLoss alleviates the impact of data imbalance. Through a real-world dataset on respiratory sound and symptom-based COVID-$19$ detection task, we validate the superiority of FedLoss. It achieves competitive COVID-$19$ detection performance compared to a centralised model with an AUC-ROC of $79\%$. It also outperforms the state-of-the-art FL baselines in sensitivity and convergence speed. Our work not only demonstrates the promise of federated COVID-$19$ detection but also paves the way to a plethora of mobile health model development in a privacy-preserving fashion.
翻译:联邦学习辅助的健康诊断模型能够整合大量个人边缘设备(如手机)的数据,同时将数据保留在原始设备本地,从而在很大程度上保障隐私。然而,这种用于健康诊断的跨设备联邦学习方法仍面临诸多挑战,原因包括局部数据不平衡(极端情况下本地数据仅包含单一疾病类别)和全局数据不平衡(疾病在人群中的流行率通常较低)。由于联邦服务器无法访问数据分布信息,解决数据不平衡问题以实现无偏模型并非易事。在本文中,我们提出FedLoss,一种用于健康诊断的新型跨设备联邦学习框架。在该框架中,联邦服务器根据边缘设备上本地数据的预测损失对训练后的模型进行加权平均,而非仅使用样本数量作为权重。由于预测损失能更好地量化设备上的数据分布,FedLoss缓解了数据不平衡的影响。通过基于呼吸音和症状的真实世界COVID-19检测数据集,我们验证了FedLoss的优越性。与集中式模型相比,它实现了具有竞争力的COVID-19检测性能,AUC-ROC达到79%。在灵敏度和收敛速度方面,它也优于最先进的联邦学习基线方法。我们的工作不仅展示了联邦COVID-19检测的潜力,也为以隐私保护方式开发大量移动健康模型铺平了道路。