As network security receives widespread attention, encrypted traffic classification has become the current research focus. However, existing methods conduct traffic classification without sufficiently considering the common characteristics between data samples, leading to suboptimal performance. Moreover, they train the packet-level and flow-level classification tasks independently, which is redundant because the packet representations learned in the packet-level task can be exploited by the flow-level task. Therefore, in this paper, we propose an effective model named a Contrastive Learning Enhanced Temporal Fusion Encoder (CLE-TFE). In particular, we utilize supervised contrastive learning to enhance the packet-level and flow-level representations and perform graph data augmentation on the byte-level traffic graph so that the fine-grained semantic-invariant characteristics between bytes can be captured through contrastive learning. We also propose cross-level multi-task learning, which simultaneously accomplishes the packet-level and flow-level classification tasks in the same model with one training. Further experiments show that CLE-TFE achieves the best overall performance on the two tasks, while its computational overhead (i.e., floating point operations, FLOPs) is only about 1/14 of the pre-trained model (e.g., ET-BERT). We release the code at https://github.com/ViktorAxelsen/CLE-TFE
翻译:随着网络安全受到广泛关注,加密流量分类已成为当前研究热点。然而,现有方法在进行流量分类时未能充分考量数据样本间的共同特征,导致性能欠佳。此外,它们独立训练数据包级和流级分类任务,这种训练方式存在冗余性——因为数据包级任务中学到的数据包表征可被流级任务利用。为此,本文提出一种名为对比学习增强时序融合编码器(CLE-TFE)的高效模型。具体而言,我们利用监督对比学习增强数据包级和流级表征,并在字节级流量图上执行图数据增强,从而通过对比学习捕获字节间细粒度的语义不变特征。同时,我们提出跨层级多任务学习机制,使同一模型在一次训练中同时完成数据包级和流级分类任务。进一步实验表明,CLE-TFE在两个任务上均取得最佳整体性能,而其计算开销(即浮点运算次数FLOPs)仅为预训练模型(如ET-BERT)的约1/14。我们已在https://github.com/ViktorAxelsen/CLE-TFE 开源代码。