Lens: A Knowledge-Guided Foundation Model for Network Traffic

Network traffic refers to the amount of data being sent and received over the Internet or any system that connects computers. Analyzing network traffic is vital for security and management, yet remains challenging due to the heterogeneity of plain-text packet headers and encrypted payloads. To capture the latent semantics of traffic, recent studies have adopted Transformer-based pretraining techniques to learn network representations from massive traffic data. However, these methods pre-train on data-driven tasks but overlook network knowledge, such as masking partial digits of the indivisible network port numbers for prediction, thereby limiting semantic understanding. In addition, they struggle to extend classification to new classes during fine-tuning due to the distribution shift. Motivated by these limitations, we propose \Lens, a unified knowledge-guided foundation model for both network traffic classification and generation. In pretraining, we propose a Knowledge-Guided Mask Span Prediction method with textual context for learning knowledge-enriched representations. For extending to new classes in finetuning, we reframe the traffic classification as a closed-ended generation task and introduce context-aware finetuning to adapt to the distribution shift. Evaluation results across various benchmark datasets demonstrate that the proposed Lens~achieves superior performance on both classification and generation tasks. For traffic classification, Lens~outperforms competitive baselines substantially on 8 out of 12 tasks with an average accuracy of \textbf{96.33\%} and extends to novel classes with significantly better performance. For traffic generation, Lens~generates better high-fidelity network traffic for network simulation, gaining up to \textbf{30.46\%} and \textbf{33.3\%} better accuracy and F1 in fuzzing tests. We will open-source the code upon publication.

翻译：网络流量是指在互联网或任何连接计算机的系统上发送和接收的数据量。分析网络流量对于安全和管理至关重要，但由于纯文本数据包头和加密有效载荷的异构性，这仍然具有挑战性。为了捕捉流量的潜在语义，最近的研究采用了基于Transformer的预训练技术，从海量流量数据中学习网络表示。然而，这些方法在数据驱动的任务上进行预训练，却忽略了网络知识，例如为了预测而掩盖不可分割的网络端口号的部分数字，从而限制了语义理解。此外，由于分布偏移，它们在微调阶段难以将分类扩展到新的类别。受这些局限性的启发，我们提出了Lens，一个统一的、知识引导的、适用于网络流量分类与生成的基础模型。在预训练阶段，我们提出了一种结合文本上下文的**知识引导的掩码跨度预测**方法，以学习知识增强的表示。为了在微调阶段扩展到新类别，我们将流量分类重新定义为封闭式生成任务，并引入**上下文感知微调**以适应分布偏移。在多个基准数据集上的评估结果表明，所提出的Lens在分类和生成任务上均取得了优越的性能。对于流量分类，Lens在12项任务中的8项上大幅优于竞争基线，平均准确率达到**96.33%**，并且在扩展到新类别时性能显著更优。对于流量生成，Lens能为网络仿真生成保真度更高的网络流量，在模糊测试中获得高达**30.46%**和**33.3%**的准确率和F1分数提升。我们将在论文发表后开源代码。