Cross-device Federated Learning for Mobile Health Diagnostics: A First Study on COVID-19 Detection

Federated learning (FL) aided health diagnostic models can incorporate data from a large number of personal edge devices (e.g., mobile phones) while keeping the data local to the originating devices, largely ensuring privacy. However, such a cross-device FL approach for health diagnostics still imposes many challenges due to both local data imbalance (as extreme as local data consists of a single disease class) and global data imbalance (the disease prevalence is generally low in a population). Since the federated server has no access to data distribution information, it is not trivial to solve the imbalance issue towards an unbiased model. In this paper, we propose FedLoss, a novel cross-device FL framework for health diagnostics. Here the federated server averages the models trained on edge devices according to the predictive loss on the local data, rather than using only the number of samples as weights. As the predictive loss better quantifies the data distribution at a device, FedLoss alleviates the impact of data imbalance. Through a real-world dataset on respiratory sound and symptom-based COVID-$19$ detection task, we validate the superiority of FedLoss. It achieves competitive COVID-$19$ detection performance compared to a centralised model with an AUC-ROC of $79\%$. It also outperforms the state-of-the-art FL baselines in sensitivity and convergence speed. Our work not only demonstrates the promise of federated COVID-$19$ detection but also paves the way to a plethora of mobile health model development in a privacy-preserving fashion.

翻译：联邦学习辅助的健康诊断模型能够整合大量个人边缘设备（如手机）的数据，同时将数据保留在原始设备本地，从而在很大程度上保障隐私。然而，这种用于健康诊断的跨设备联邦学习方法仍面临诸多挑战，原因包括局部数据不平衡（极端情况下本地数据仅包含单一疾病类别）和全局数据不平衡（疾病在人群中的流行率通常较低）。由于联邦服务器无法访问数据分布信息，解决数据不平衡问题以实现无偏模型并非易事。在本文中，我们提出FedLoss，一种用于健康诊断的新型跨设备联邦学习框架。在该框架中，联邦服务器根据边缘设备上本地数据的预测损失对训练后的模型进行加权平均，而非仅使用样本数量作为权重。由于预测损失能更好地量化设备上的数据分布，FedLoss缓解了数据不平衡的影响。通过基于呼吸音和症状的真实世界COVID-19检测数据集，我们验证了FedLoss的优越性。与集中式模型相比，它实现了具有竞争力的COVID-19检测性能，AUC-ROC达到79%。在灵敏度和收敛速度方面，它也优于最先进的联邦学习基线方法。我们的工作不仅展示了联邦COVID-19检测的潜力，也为以隐私保护方式开发大量移动健康模型铺平了道路。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

35+阅读 · 2022年3月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日