Differentially private federated learning for localized control of infectious disease dynamics

In times of epidemics, swift reaction is necessary to mitigate epidemic spreading. For this reaction, localized approaches have several advantages, limiting necessary resources and reducing the impact of interventions on a larger scale. However, training a separate machine learning (ML) model on a local scale is often not feasible due to limited available data. Centralizing the data is also challenging because of its high sensitivity and privacy constraints. In this study, we consider a localized strategy based on the German counties and communities managed by the related local health authorities (LHA). For the preservation of privacy to not oppose the availability of detailed situational data, we propose a privacy-preserving forecasting method that can assist public health experts and decision makers. ML methods with federated learning (FL) train a shared model without centralizing raw data. Considering the counties, communities or LHAs as clients and finding a balance between utility and privacy, we study a FL framework with client-level differential privacy (DP). We train a shared multilayer perceptron on sliding windows of recent case counts to forecast the number of cases, while clients exchange only norm-clipped updates and the server aggregated updates with DP noise. We evaluate the approach on COVID-19 data on county-level during two phases. As expected, very strict privacy yields unstable, unusable forecasts. At a moderately strong level, the DP model closely approaches the non-DP model: R2 around 0.94 (vs. 0.95) and mean absolute percentage error (MAPE) of 26 % in November 2020; R2 around 0.88 (vs. 0.93) and MAPE of 21 % in March 2022. Overall, client-level DP-FL can deliver useful county-level predictions with strong privacy guarantees, and viable privacy budgets depend on epidemic phase, allowing privacy-compliant collaboration among health authorities for local forecasting.

翻译：在流行病暴发时期，迅速采取应对措施对于遏制疫情传播至关重要。本地化应对策略具有多重优势，既能限制所需资源，又能减少干预措施在更大范围内的影响。然而，由于本地可用数据有限，在局部范围训练独立的机器学习模型通常难以实现。而集中化数据也因数据高度敏感和隐私限制而面临挑战。本研究提出一种基于德国县区及社区的本地化策略，这些区域由相应的地方卫生机构管理。为在保护隐私的同时不牺牲详细态势数据的可用性，我们提出一种隐私保护的预测方法，以辅助公共卫生专家和决策者。采用联邦学习的机器学习方法可在不集中原始数据的情况下训练共享模型。将县区、社区或地方卫生机构视为客户端，并在效用与隐私之间寻求平衡，我们研究了一种具备客户端级差分隐私的联邦学习框架。我们基于近期病例数的滑动窗口训练共享多层感知机来预测病例数量，客户端仅交换经过范数裁剪的更新，服务器则聚合添加了差分隐私噪声的更新。我们在两个阶段使用县级COVID-19数据对该方法进行评估。正如预期，过于严格的隐私保护会导致不稳定且不可用的预测结果。在中等强度隐私保护水平下，差分隐私模型能接近非差分隐私模型的性能：2020年11月R²约为0.94（对比0.95），平均绝对百分比误差为26%；2022年3月R²约为0.88（对比0.93），平均绝对百分比误差为21%。总体而言，客户端级差分隐私联邦学习能够在强隐私保证下提供有效的县级预测，可行的隐私预算取决于疫情发展阶段，这为卫生机构间开展符合隐私规范的本地化预测协作提供了可能。