Early outbreak data analysis is critical for informing about their potential impact and interventions. However, data obtained early in outbreaks are often sensitive and subject to strict privacy restrictions. Thus, federated analysis, which implies decentralised collaborative analysis where no raw data sharing is required, emerged as an attractive paradigm to solve issues around data privacy and confidentiality. In the present study, we propose two approaches which require neither data sharing nor direct communication between devices/servers. The first approach approximates the joint posterior distributions via a multivariate normal distribution and uses this information to update prior distributions sequentially. The second approach uses summaries from parameters' posteriors obtained locally at different locations (sites) to perform a meta-analysis via a hierarchical model. We test these models on simulated and on real outbreak data to estimate the incubation period of multiple infectious diseases. Results indicate that both approaches can recover incubation period parameters accurately, but they present different inferential advantages. While the approximation approach permits to work with full posterior distributions, thus providing a better quantification of uncertainty; the meta-analysis approach allows for an explicit hierarchical structure, which can make some parameters more interpretable. We provide a framework for federated analysis of early outbreak data where the public health contexts are complex.
翻译:早期暴发数据分析对于评估其潜在影响和制定干预措施至关重要。然而,在疫情暴发初期获取的数据通常较为敏感,并受到严格的隐私限制。因此,联邦分析——一种无需共享原始数据的去中心化协作分析范式——成为解决数据隐私与保密性问题的理想方案。本研究提出了两种既不需要数据共享,也不要求设备/服务器间直接通信的方法。第一种方法通过多元正态分布近似联合后验分布,并利用该信息序贯更新先验分布。第二种方法利用在不同地点本地获取的参数后验分布摘要,通过分层模型进行荟萃分析。我们在模拟和真实暴发数据上测试了这些模型,以估计多种传染病的潜伏期。结果表明,两种方法均能准确恢复潜伏期参数,但各自具有不同的推断优势:近似方法能够处理完整的后验分布,从而提供更优的不确定性量化;而荟萃分析方法允许显式的分层结构,可使部分参数更具可解释性。本研究为复杂公共卫生背景下的早期暴发数据联邦分析提供了一个框架。