As cellular networks evolve towards the 6th Generation (6G), Machine Learning (ML) is seen as a key enabling technology to improve the capabilities of the network. ML provides a methodology for predictive systems, which, in turn, can make networks become proactive. This proactive behavior of the network can be leveraged to sustain, for example, a specific Quality of Service (QoS) requirement. With predictive Quality of Service (pQoS), a wide variety of new use cases, both safety- and entertainment-related, are emerging, especially in the automotive sector. Therefore, in this work, we consider maximum throughput prediction enhancing, for example, streaming or HD mapping applications. We discuss the entire ML workflow highlighting less regarded aspects such as the detailed sampling procedures, the in-depth analysis of the dataset characteristics, the effects of splits in the provided results, and the data availability. Reliable ML models need to face a lot of challenges during their lifecycle. We highlight how confidence can be built on ML technologies by better understanding the underlying characteristics of the collected data. We discuss feature engineering and the effects of different splits for the training processes, showcasing that random splits might overestimate performance by more than twofold. Moreover, we investigate diverse sets of input features, where network information proved to be most effective, cutting the error by half. Part of our contribution is the validation of multiple ML models within diverse scenarios. We also use Explainable AI (XAI) to show that ML can learn underlying principles of wireless networks without being explicitly programmed. Our data is collected from a deployed network that was under full control of the measurement team and covered different vehicular scenarios and radio environments.
翻译:随着蜂窝网络向第六代(6G)演进,机器学习被视为提升网络能力的关键赋能技术。机器学习为预测系统提供了方法论基础,进而使网络能够具备主动性。这种网络主动行为可用于保障特定的服务质量需求。借助预测式服务质量(pQoS),一系列涉及安全与娱乐的新应用场景正不断涌现,尤其在汽车领域。因此,本研究聚焦于最大吞吐量预测,以增强流媒体或高清地图等应用。我们探讨了完整的机器学习工作流,重点关注采样流程细节、数据集特征深度分析、数据划分对结果的影响及数据可用性等常被忽视的方面。可靠的机器学习模型在其生命周期中面临诸多挑战。我们通过深入理解采集数据的潜在特征,展示了如何建立对机器学习技术的信心。本文讨论了特征工程及不同训练数据划分方式的影响,表明随机划分可能导致性能被高估两倍以上。此外,我们研究了多种输入特征组合,其中网络信息被证明最为有效,可将误差降低一半。研究贡献之一是在不同场景下验证了多种机器学习模型。我们还利用可解释人工智能(XAI)表明,机器学习无需显式编程即可学习无线网络的底层原理。实验数据采集自测量团队完全控制的部署网络,涵盖了多种车载场景和无线环境。