Artificial Intelligence-based (AI) analysis of large, curated medical datasets is promising for providing early detection, faster diagnosis, and more effective treatment using low-power Electrocardiography (ECG) monitoring devices information. However, accessing sensitive medical data from diverse sources is highly restricted since improper use, unsafe storage, or data leakage could violate a person's privacy. This work uses a Federated Learning (FL) privacy-preserving methodology to train AI models over heterogeneous sets of high-definition ECG from 12-lead sensor arrays collected from six heterogeneous sources. We evaluated the capacity of the resulting models to achieve equivalent performance compared to state-of-the-art models trained in a Centralized Learning (CL) fashion. Moreover, we assessed the performance of our solution over Independent and Identical distributed (IID) and non-IID federated data. Our methodology involves machine learning techniques based on Deep Neural Networks and Long-Short-Term Memory models. It has a robust data preprocessing pipeline with feature engineering, selection, and data balancing techniques. Our AI models demonstrated comparable performance to models trained using CL, IID, and non-IID approaches. They showcased advantages in reduced complexity and faster training time, making them well-suited for cloud-edge architectures.
翻译:基于人工智能(AI)的大规模、精选医学数据集分析,有望利用低功耗心电图监测设备信息实现早期检测、快速诊断和更有效治疗。然而,由于不当使用、不安全存储或数据泄露可能侵犯个人隐私,获取来自不同来源的敏感医疗数据受到严格限制。本研究采用联邦学习隐私保护方法,在来自六个异构源的12导联传感器阵列采集的高清心电图异构数据集上训练AI模型。我们评估了所得模型在性能上与以集中式学习方式训练的最新模型达到同等水平的能力。此外,我们评估了解决方案在独立同分布和非独立同分布联邦数据上的性能。我们的方法基于深度神经网络和长短期记忆模型的机器学习技术,并包含具有特征工程、特征选择和数据平衡技术的稳健数据预处理流程。我们的AI模型展现出与使用集中式学习、独立同分布和非独立同分布方法训练的模型相当的性能,且在降低复杂性和加快训练时间方面具有优势,使其特别适用于云边架构。