Trustworthy Blockchain-based Federated Learning for Electronic Health Records: Securing Participant Identity with Decentralized Identifiers and Verifiable Credentials

翻译：基于区块链的可信联邦学习用于电子健康记录：利用去中心化标识符与可验证凭证保障参与者身份安全

Rodrigo Tertulino,Ricardo Almeida,Laercio Alencar

The digitization of healthcare has generated massive volumes of Electronic Health Records (EHRs), offering unprecedented opportunities for training Artificial Intelligence (AI) models. However, stringent privacy regulations such as GDPR and HIPAA have created data silos that prevent centralized training. Federated Learning (FL) has emerged as a promising solution that enables collaborative model training without sharing raw patient data. Despite its potential, FL remains vulnerable to poisoning and Sybil attacks, in which malicious participants corrupt the global model or infiltrate the network using fake identities. While recent approaches integrate Blockchain technology for auditability, they predominantly rely on probabilistic reputation systems rather than robust cryptographic identity verification. This paper proposes a Trustworthy Blockchain-based Federated Learning (TBFL) framework integrating Self-Sovereign Identity (SSI) standards. By leveraging Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs), our architecture ensures only authenticated healthcare entities contribute to the global model. Through comprehensive evaluation using the MIMIC-IV dataset, we demonstrate that anchoring trust in cryptographic identity verification rather than behavioral patterns significantly mitigates security risks while maintaining clinical utility. Our results show the framework successfully neutralizes 100% of Sybil attacks, achieves robust predictive performance (AUC = 0.954, Recall = 0.890), and introduces negligible computational overhead (<0.12%). The approach provides a secure, scalable, and economically viable ecosystem for inter-institutional health data collaboration, with total operational costs of approximately $18 for 100 training rounds across multiple institutions.

翻译：医疗保健的数字化产生了海量的电子健康记录，为训练人工智能模型提供了前所未有的机遇。然而，严格的隐私法规如GDPR和HIPAA造成了数据孤岛，阻碍了集中式训练。联邦学习作为一种有前景的解决方案应运而生，它能够在无需共享原始患者数据的情况下实现协作式模型训练。尽管具有潜力，联邦学习仍易受投毒攻击和女巫攻击的影响，即恶意参与者可能破坏全局模型或使用虚假身份渗透网络。虽然近期方法整合区块链技术以增强可审计性，但它们主要依赖概率性声誉系统而非强密码学身份验证。本文提出一种整合自主权身份标准的基于区块链的可信联邦学习框架。通过利用去中心化标识符和可验证凭证，我们的架构确保只有经过认证的医疗实体才能对全局模型做出贡献。基于MIMIC-IV数据集的综合评估表明，将信任锚定于密码学身份验证而非行为模式，能在保持临床效用的同时显著降低安全风险。实验结果显示，该框架成功抵御100%的女巫攻击，实现稳健的预测性能，并引入可忽略的计算开销。该方法为跨机构健康数据协作提供了一个安全、可扩展且经济可行的生态系统，在多机构间进行100轮训练的总运营成本约为18美元。

相关内容

健康

关注 27

健康是指一个人在身体、精神和社会等方面都处于良好的状态。健康包括两个方面的内容：

一是主要脏器无疾病，身体形态发育良好，体形均匀，人体各系统具有良好的生理功能，有较强的身体活动能力和劳动能力，这是对健康最基本的要求；

二是对疾病的抵抗能力较强，能够适应环境变化，各种生理刺激以及致病因素对身体的作用。传统的健康观是“无病即健康”，现代人的健康观是整体健康，世界卫生组织提出“健康不仅是躯体没有疾病，还要具备心理健康、社会适应良好和有道德”。因此，现代人的健康内容包括：躯体健康、心理健康、心灵健康、社会健康、智力健康、道德健康、环境健康等。健康是人的基本权利。健康是人生的第一财富。

利用表示学习推动多机构电子健康记录数据研究

专知会员服务

16+阅读 · 2025年2月17日

「联邦学习模型安全与隐私」研究进展

专知会员服务

69+阅读 · 2022年9月24日

【中文版】《医学影像中的联邦学习第二部分：方法、挑战和考虑事项》格罗宁根大学医学中心2022最新综述

专知会员服务

30+阅读 · 2022年7月27日

联邦学习智慧医疗综述

专知会员服务

122+阅读 · 2021年11月27日