We propose FedScore, a privacy-preserving federated learning framework for scoring system generation across multiple sites to facilitate cross-institutional collaborations. The FedScore framework includes five modules: federated variable ranking, federated variable transformation, federated score derivation, federated model selection and federated model evaluation. To illustrate usage and assess FedScore's performance, we built a hypothetical global scoring system for mortality prediction within 30 days after a visit to an emergency department using 10 simulated sites divided from a tertiary hospital in Singapore. We employed a pre-existing score generator to construct 10 local scoring systems independently at each site and we also developed a scoring system using centralized data for comparison. We compared the acquired FedScore model's performance with that of other scoring models using the receiver operating characteristic (ROC) analysis. The FedScore model achieved an average area under the curve (AUC) value of 0.763 across all sites, with a standard deviation (SD) of 0.020. We also calculated the average AUC values and SDs for each local model, and the FedScore model showed promising accuracy and stability with a high average AUC value which was closest to the one of the pooled model and SD which was lower than that of most local models. This study demonstrates that FedScore is a privacy-preserving scoring system generator with potentially good generalizability.
翻译:我们提出FedScore,这是一个用于跨多站点评分系统生成的隐私保护联邦学习框架,旨在促进跨机构协作。该框架包含五个模块:联邦变量排序、联邦变量转换、联邦评分推导、联邦模型选择和联邦模型评估。为说明使用方法和评估FedScore性能,我们利用从新加坡一家三级医院划分出的10个模拟站点,构建了一个针对急诊就诊后30天内死亡率预测的假设性全局评分系统。我们采用现有评分生成器,在每个站点独立构建10个本地评分系统,并利用集中式数据开发了一个评分系统作为对比。通过受试者工作特征(ROC)分析,我们将FedScore模型的性能与其他评分模型进行比较。FedScore模型在所有站点的平均曲线下面积(AUC)值为0.763,标准差(SD)为0.020。我们还计算了每个本地模型的平均AUC值和标准差,FedScore模型在准确性和稳定性方面表现优异:其平均AUC值最高,最接近汇总模型的值,而标准差低于大多数本地模型。本研究证明FedScore是一个具有良好泛化潜力的隐私保护评分系统生成器。