This work investigates the potential of Federated Learning (FL) for official statistics and shows how well the performance of FL models can keep up with centralized learning methods.F L is particularly interesting for official statistics because its utilization can safeguard the privacy of data holders, thus facilitating access to a broader range of data. By simulating three different use cases, important insights on the applicability of the technology are gained. The use cases are based on a medical insurance data set, a fine dust pollution data set and a mobile radio coverage data set - all of which are from domains close to official statistics. We provide a detailed analysis of the results, including a comparison of centralized and FL algorithm performances for each simulation. In all three use cases, we were able to train models via FL which reach a performance very close to the centralized model benchmarks. Our key observations and their implications for transferring the simulations into practice are summarized. We arrive at the conclusion that FL has the potential to emerge as a pivotal technology in future use cases of official statistics.
翻译:本研究探讨了联邦学习在官方统计领域的应用潜力,并展示了联邦模型性能与集中式学习方法相媲美的程度。联邦学习因其能够保护数据持有者隐私、促进更广泛数据获取的特点,对官方统计领域尤其具有吸引力。通过模拟三个不同的应用场景,本研究获得了关于该技术适用性的重要见解。这些场景分别基于医疗保险数据集、细颗粒物污染数据集和移动无线电覆盖数据集——所有数据集均来自与官方统计密切相关的领域。我们对结果进行了详细分析,包括每次模拟中集中式算法与联邦学习算法性能的对比。在全部三个应用场景中,我们均通过联邦学习训练出了性能非常接近集中式模型基准的模型。本文总结了将模拟转化为实践的关键观察及其启示,并得出结论:联邦学习有潜力成为未来官方统计应用场景中的关键性技术。