With rapid evolution of mobile core network (MCN) architectures, large-scale control-plane traffic (CPT) traces are critical to studying MCN design and performance optimization by the R&D community. The prior-art control-plane traffic generator SMM heavily relies on domain knowledge which requires re-design as the domain evolves. In this work, we study the feasibility of developing a high-fidelity MCN control plane traffic generator by leveraging generative ML models. We identify key challenges in synthesizing high-fidelity CPT including generic (to data-plane) requirements such as multimodality feature relationships and unique requirements such as stateful semantics and long-term (time-of-day) data variations. We show state-of-the-art, generative adversarial network (GAN)-based approaches shown to work well for data-plane traffic cannot meet these fidelity requirements of CPT, and develop a transformer-based model, CPT-GPT, that accurately captures complex dependencies among the samples in each traffic stream (control events by the same UE) without the need for GAN. Our evaluation of CPT-GPT on a large-scale control-plane traffic trace shows that (1) it does not rely on domain knowledge yet synthesizes control-plane traffic with comparable fidelity as SMM; (2) compared to the prior-art GAN-based approach, it reduces the fraction of streams that violate stateful semantics by two orders of magnitude, the max y-distance of sojourn time distributions of streams by 16.0%, and the transfer learning time in deriving new hourly models by 3.36x.
翻译:随着移动核心网架构的快速演进,大规模控制平面流量轨迹对于研发界研究移动核心网设计与性能优化至关重要。现有最先进的控制平面流量生成器SMM严重依赖领域知识,这要求其随着领域发展而重新设计。在本工作中,我们研究了利用生成式机器学习模型开发高保真移动核心网控制平面流量生成器的可行性。我们识别了合成高保真控制平面流量的关键挑战,包括通用(相对于数据平面)要求(如多模态特征关系)和独特要求(如状态语义和长期(一天内时段)数据变化)。我们表明,已被证明在数据平面流量生成中表现良好的、基于生成对抗网络的最先进方法无法满足控制平面流量的这些保真度要求,并开发了一种基于Transformer的模型——CPT-GPT,该模型能够准确捕获每个流量流(同一用户设备的控制事件)中样本间的复杂依赖关系,而无需使用生成对抗网络。我们在大型控制平面流量轨迹上对CPT-GPT的评估表明:(1)它不依赖领域知识,却能合成与SMM保真度相当的控制平面流量;(2)与之前最先进的基于生成对抗网络的方法相比,它将违反状态语义的流比例降低了两个数量级,将流的逗留时间分布的最大y距离降低了16.0%,并将推导新小时模型的迁移学习时间缩短了3.36倍。