Identifying predictive factors for an outcome of interest via a multivariable analysis is often difficult when the data set is small. Combining data from different medical centers into a single (larger) database would alleviate this problem, but is in practice challenging due to regulatory and logistic problems. Federated Learning (FL) is a machine learning approach that aims to construct from local inferences in separate data centers what would have been inferred had the data sets been merged. It seeks to harvest the statistical power of larger data sets without actually creating them. The FL strategy is not always efficient and precise. Therefore, in this paper we refine and implement an alternative Bayesian Federated Inference (BFI) framework for multicenter data with the same aim as FL. The BFI framework is designed to cope with small data sets by inferring locally not only the optimal parameter values, but also additional features of the posterior parameter distribution, capturing information beyond what is used in FL. BFI has the additional benefit that a single inference cycle across the centers is sufficient, whereas FL needs multiple cycles. We quantify the performance of the proposed methodology on simulated and real life data.
翻译:通过多变量分析识别结局预测因子在数据集较小时往往十分困难。将不同医疗中心的数据整合为单一(更大)数据库虽能缓解此问题,但在实践中因监管与后勤障碍难以实现。联邦学习(FL)是一种机器学习方法,旨在基于独立数据中心的本局推断,构建出若数据集合并后本应获得的推论。该方法试图在不实际创建大型数据集的情况下,获取大数据的统计效力,但FL策略在效率与精度上存在不足。为此,本文改进并实现了一种替代性贝叶斯联邦推断(BFI)框架,面向多中心数据且目标与FL一致。BFI框架专为小数据集设计,不仅本局推断最优参数值,还提取后验参数分布的额外特征,捕获超越FL所用信息。BFI的另一优势在于:仅需对各中心进行一次推断循环,而FL需要多次循环。我们通过模拟数据与真实数据量化了所提方法的性能。