Lens: A Foundation Model for Network Traffic

Network traffic refers to the amount of data being sent and received over the internet or any system that connects computers. Analyzing and understanding network traffic is vital for improving network security and management. However, the analysis of network traffic is challenging due to the diverse nature of data packets, which often feature heterogeneous headers and encrypted payloads lacking semantics. To capture the latent semantics of traffic, a few studies have adopted pre-training techniques based on the Transformer encoder or decoder to learn the representations from massive traffic data. However, these methods typically excel in traffic understanding (classification) or traffic generation tasks. To address this issue, we develop Lens, a foundation model for network traffic that leverages the T5 architecture to learn the pre-trained representations from large-scale unlabeled data. Harnessing the strength of the encoder-decoder framework, which captures the global information while preserving the generative ability, our model can better learn the representations from raw data. To further enhance pre-training effectiveness, we design a novel loss that combines three distinct tasks: Masked Span Prediction (MSP), Packet Order Prediction (POP), and Homologous Traffic Prediction (HTP). Evaluation results across various benchmark datasets demonstrate that the proposed Lens outperforms the baselines in most downstream tasks related to both traffic understanding and generation. Notably, it also requires much less labeled data for fine-tuning compared to current methods.

翻译：网络流量是指在互联网或任何连接计算机的系统上发送和接收的数据量。分析和理解网络流量对于提升网络安全与管理至关重要。然而，由于数据包的多样性，网络流量分析面临挑战，这些数据包通常具有异构的头部和缺乏语义的加密载荷。为捕捉流量的潜在语义，已有少数研究采用基于Transformer编码器或解码器的预训练技术，从海量流量数据中学习表征。然而，这些方法通常在流量理解（分类）或流量生成任务中表现突出。为解决这一问题，我们开发了Lens——一种基于T5架构的网络流量基础模型，能够从大规模无标注数据中学习预训练表征。借助编码器-解码器框架的优势，该框架在保持生成能力的同时捕获全局信息，我们的模型能够更好地从原始数据中学习表征。为进一步提升预训练效果，我们设计了一种结合三项独立任务的新型损失函数：掩码跨度预测（MSP）、数据包顺序预测（POP）和同源流量预测（HTP）。在多个基准数据集上的评估结果表明，所提出的Lens模型在大多数涉及流量理解与生成的下游任务中均优于基线方法。值得注意的是，与现有方法相比，该模型进行微调所需的标注数据量也显著减少。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日