The digitization of healthcare has generated massive volumes of Electronic Health Records (EHRs), offering unprecedented opportunities for training Artificial Intelligence (AI) models. However, stringent privacy regulations such as GDPR and HIPAA have created data silos that prevent centralized training. Federated Learning (FL) has emerged as a promising solution that enables collaborative model training without sharing raw patient data. Despite its potential, FL remains vulnerable to poisoning and Sybil attacks, in which malicious participants corrupt the global model or infiltrate the network using fake identities. While recent approaches integrate Blockchain technology for auditability, they predominantly rely on probabilistic reputation systems rather than robust cryptographic identity verification. This paper proposes a Trustworthy Blockchain-based Federated Learning (TBFL) framework integrating Self-Sovereign Identity (SSI) standards. By leveraging Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs), our architecture ensures only authenticated healthcare entities contribute to the global model. Through comprehensive evaluation using the MIMIC-IV dataset, we demonstrate that anchoring trust in cryptographic identity verification rather than behavioral patterns significantly mitigates security risks while maintaining clinical utility. Our results show the framework successfully neutralizes 100% of Sybil attacks, achieves robust predictive performance (AUC = 0.954, Recall = 0.890), and introduces negligible computational overhead (<0.12%). The approach provides a secure, scalable, and economically viable ecosystem for inter-institutional health data collaboration, with total operational costs of approximately $18 for 100 training rounds across multiple institutions.
翻译:医疗保健的数字化产生了海量的电子健康记录,为训练人工智能模型提供了前所未有的机遇。然而,严格的隐私法规如GDPR和HIPAA造成了数据孤岛,阻碍了集中式训练。联邦学习作为一种有前景的解决方案应运而生,它能够在无需共享原始患者数据的情况下实现协作式模型训练。尽管具有潜力,联邦学习仍易受投毒攻击和女巫攻击的影响,即恶意参与者可能破坏全局模型或使用虚假身份渗透网络。虽然近期方法整合区块链技术以增强可审计性,但它们主要依赖概率性声誉系统而非强密码学身份验证。本文提出一种整合自主权身份标准的基于区块链的可信联邦学习框架。通过利用去中心化标识符和可验证凭证,我们的架构确保只有经过认证的医疗实体才能对全局模型做出贡献。基于MIMIC-IV数据集的综合评估表明,将信任锚定于密码学身份验证而非行为模式,能在保持临床效用的同时显著降低安全风险。实验结果显示,该框架成功抵御100%的女巫攻击,实现稳健的预测性能,并引入可忽略的计算开销。该方法为跨机构健康数据协作提供了一个安全、可扩展且经济可行的生态系统,在多机构间进行100轮训练的总运营成本约为18美元。