Cellular traffic prediction is of great importance on the path of enabling 5G mobile networks to perform intelligent and efficient infrastructure planning and management. However, available data are limited to base station logging information. Hence, training methods for generating high-quality predictions that can generalize to new observations across diverse parties are in demand. Traditional approaches require collecting measurements from multiple base stations, transmitting them to a central entity and conducting machine learning operations using the acquire data. The dissemination of local observations raises concerns regarding confidentiality and performance, which impede the applicability of machine learning techniques. Although various distributed learning methods have been proposed to address this issue, their application to traffic prediction remains highly unexplored. In this work, we investigate the efficacy of federated learning applied to raw base station LTE data for time-series forecasting. We evaluate one-step predictions using five different neural network architectures trained with a federated setting on non-identically distributed data. Our results show that the learning architectures adapted to the federated setting yield equivalent prediction error to the centralized setting. In addition, preprocessing techniques on base stations enhance forecasting accuracy, while advanced federated aggregators do not surpass simpler approaches. Simulations considering the environmental impact suggest that federated learning holds the potential for reducing carbon emissions and energy consumption. Finally, we consider a large-scale scenario with synthetic data and demonstrate that federated learning reduces the computational and communication costs compared to centralized settings.
翻译:蜂窝流量预测对于实现5G移动网络智能化、高效化的基础设施规划与管理具有重要意义。然而,可用的数据仅限于基站日志信息。因此,亟需训练方法以生成高质量预测,且能泛化至不同参与方的新观测样本。传统方法需收集多个基站的测量数据,将其传输至中心实体,并利用所获数据执行机器学习操作。本地观测数据的传播引发了机密性与性能方面的担忧,从而阻碍了机器学习技术的适用性。尽管已有多种分布式学习方法被提出以解决此问题,但其在流量预测中的应用仍鲜有探索。本研究探讨了将联邦学习应用于原始基站LTE时间序列预测的有效性。我们利用五种不同的神经网络架构,在非独立同分布数据上以联邦设置进行训练,并评估其单步预测性能。结果表明,适应联邦设置的学习架构所产生的预测误差与集中式设置相当。此外,对基站数据采用预处理技术可提升预测精度,而高级联邦聚合器并未优于简单方法。考虑环境影响进行的仿真表明,联邦学习具有降低碳排放与能源消耗的潜力。最后,我们基于合成数据构建大规模场景实验,证明联邦学习相较于集中式设置可降低计算与通信成本。