Accurate Network Traffic Classification (NTC) is increasingly constrained by limited labeled data and strict privacy requirements. While Network Traffic Generation (NTG) provides an effective means to mitigate data scarcity, conventional generative methods struggle to model the complex temporal dynamics of modern traffic or/and often incur significant computational cost. In this article, we address the NTG task using lightweight Generative Artificial Intelligence (GenAI) architectures, including transformer-based, state-space, and diffusion models designed for practical deployment. We conduct a systematic evaluation along four axes: (i) (synthetic) traffic fidelity, (ii) synthetic-only training, (iii) data augmentation under low-data regimes, and (iv) computational efficiency. Experiments on two heterogeneous datasets show that lightweight GenAI models preserve both static and temporal traffic characteristics, with transformer and state-space models closely matching real distributions across a complete set of fidelity metrics. Classifiers trained solely on synthetic traffic achieve up to 87% F1-score on real data. In low-data settings, GenAI-driven augmentation improves NTC performance by up to +40%, substantially reducing the gap with full-data training. Overall, transformer-based models provide the best trade-off between fidelity and efficiency, enabling high-quality, privacy-aware traffic synthesis with modest computational overhead.
翻译:精确的网络流量分类(NTC)日益受到标注数据有限和严格隐私要求的制约。尽管网络流量生成(NTG)提供了缓解数据稀缺的有效手段,但传统生成方法难以建模现代流量的复杂时间动态特性,且/或往往带来显著的计算开销。本文采用轻量级生成式人工智能(GenAI)架构(包括面向实际部署的Transformer、状态空间和扩散模型)解决NTG任务。我们沿四个维度进行系统评估:(i)(合成)流量保真度,(ii)仅用合成数据训练,(iii)低数据场景下的数据增强,以及(iv)计算效率。在两个异构数据集上的实验表明,轻量级GenAI模型能同时保持流量的静态与时间特性,其中Transformer和状态空间模型在完整保真度指标集上与真实分布高度吻合。仅凭合成数据训练的分类器在真实数据上F1分数最高可达87%。在低数据设置下,GenAI驱动的数据增强将NTC性能提升高达+40%,显著缩小了与全数据训练的差距。总体而言,基于Transformer的模型在保真度与效率之间实现了最佳平衡,能够以适度的计算开销实现高质量、隐私感知的流量合成。