Identifying predictive factors via multivariable statistical analysis is for rare diseases often impossible because the data sets available are too small. Combining data from different medical centers into a single (larger) database would alleviate this problem, but is in practice challenging due to regulatory and logistic problems. Federated Learning (FL) is a machine learning approach that aims to construct from local inferences in separate data centers what would have been inferred had the data sets been merged. It seeks to harvest the statistical power of larger data sets without actually creating them. The FL strategy is not always feasible for small data sets. Therefore, in this paper we refine and implement an alternative Bayesian Federated Inference (BFI) framework for multi center data with the same aim as FL. The BFI framework is designed to cope with small data sets by inferring locally not only the optimal parameter values, but also additional features of the posterior parameter distribution, capturing information beyond that is used in FL. BFI has the additional benefit that a single inference cycle across the centers is sufficient, whereas FL needs multiple cycles. We quantify the performance of the proposed methodology on simulated and real life data.
翻译:通过多变量统计分析识别预测因子对于罕见病通常不可行,因为可用的数据集规模太小。将不同医疗中心的数据合并到单一(更大)数据库本可缓解此问题,但实际操作中因监管和物流问题面临挑战。联邦学习(FL)是一种机器学习方法,旨在通过独立数据中心中的局部推断,构建出若数据合并后本应推断出的结果,从而在不实际创建大数据集的情况下获取其统计效力。然而,对于小数据集而言,FL策略并非总是可行。为此,本文提炼并实现了一种替代性的贝叶斯联合推断(BFI)框架,面向多中心数据且目标与FL相同。BFI框架专为应对小数据集设计,不仅在局部推断最优参数值,还推断后验参数分布的附加特征,以捕获超出FL所利用的信息。BFI的额外优势在于,仅需在各中心间进行单次推断循环即可,而FL则需要多次循环。我们在模拟数据和真实数据上量化了所提方法的性能。