Towards a graph-based foundation model for network traffic analysis

Foundation models have shown great promise in various fields of study. A potential application of such models is in computer network traffic analysis, where these models can grasp the complexities of network traffic dynamics and adapt to any specific task or network environment with minimal fine-tuning. Previous approaches have used tokenized hex-level packet data and the model architecture of large language transformer models. We propose a new, efficient graph-based alternative at the flow-level. Our approach represents network traffic as a dynamic spatio-temporal graph, employing a self-supervised link prediction pretraining task to capture the spatial and temporal dynamics in this network graph framework. To evaluate the effectiveness of our approach, we conduct a few-shot learning experiment for three distinct downstream network tasks: intrusion detection, traffic classification, and botnet classification. Models finetuned from our pretrained base achieve an average performance increase of 6.87\% over training from scratch, demonstrating their ability to effectively learn general network traffic dynamics during pretraining. This success suggests the potential for a large-scale version to serve as an operational foundational model.

翻译：基础模型已在多个研究领域展现出巨大潜力。此类模型在计算机网络流量分析中具有潜在应用价值，能够理解网络流量动态的复杂性，并通过少量微调适应特定任务或网络环境。先前研究采用十六进制数据包的分词化表示，并借鉴大型语言Transformer模型架构。本文提出一种新颖高效的流级别图结构替代方案。该方法将网络流量表征为动态时空图，通过自监督链路预测预训练任务来捕捉网络图框架中的时空动态特性。为评估方法的有效性，我们在三个不同的下游网络任务（入侵检测、流量分类和僵尸网络分类）上进行了少样本学习实验。基于预训练模型微调后的模型性能相比从头训练平均提升6.87%，证明该方法在预训练阶段能有效学习通用网络流量动态特征。这一成功表明大规模版本具备成为可操作基础模型的潜力。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日