This work investigates the potential of Federated Learning (FL) for official statistics and shows how well the performance of FL models can keep up with centralized learning methods. At the same time, its utilization can safeguard the privacy of data holders, thus facilitating access to a broader range of data and ultimately enhancing official statistics. By simulating three different use cases, important insights on the applicability of the technology are gained. The use cases are based on a medical insurance data set, a fine dust pollution data set and a mobile radio coverage data set - all of which are from domains close to official statistics. We provide a detailed analysis of the results, including a comparison of centralized and FL algorithm performances for each simulation. In all three use cases, we were able to train models via FL which reach a performance very close to the centralized model benchmarks. Our key observations and their implications for transferring the simulations into practice are summarized. We arrive at the conclusion that FL has the potential to emerge as a pivotal technology in future use cases of official statistics.
翻译:本研究探讨了联邦学习在官方统计中的潜力,并展示了联邦学习模型的性能能够与集中式学习方法相媲美的程度。同时,其应用能够保障数据持有者的隐私,从而促进对更广泛数据的访问,最终提升官方统计质量。通过模拟三个不同的用例,本研究获得了关于该技术适用性的重要见解。这些用例基于医疗保险数据集、细颗粒物污染数据集和移动无线电覆盖数据集——均来自与官方统计密切相关的领域。我们提供了详细的结果分析,包括每次模拟中集中式算法与联邦学习算法的性能对比。在所有三个用例中,我们均能通过联邦学习训练出性能与集中式模型基准非常接近的模型。本文总结了关键发现及其对将模拟转化为实际应用的启示。我们得出结论:联邦学习有潜力成为未来官方统计应用中的关键性技术。