Encrypted traffic classification (TC) methods must adapt to new protocols and extensions as well as to advancements in other machine learning fields. In this paper, we adopt a transfer learning setup best known from computer vision. We first pretrain an embedding model on a complex task with a large number of classes and then transfer it to seven established TC datasets. The pretraining task is recognition of SNI domains in encrypted QUIC traffic, which in itself is a challenge for network monitoring due to the growing adoption of TLS Encrypted Client Hello. Our training pipeline -- featuring a disjoint class setup, ArcFace loss function, and a modern deep learning architecture -- aims to produce universal embeddings applicable across tasks. A transfer method based on model fine-tuning surpassed SOTA performance on nine of ten downstream TC tasks, with an average improvement of 6.4%. Furthermore, a comparison with a baseline method using raw packet sequences revealed unexpected findings with potential implications for the broader TC field. We released the model architecture, trained weights, and codebase for transfer learning experiments.
翻译:加密流量分类方法必须适应新协议、扩展技术以及其他机器学习领域的进展。本文采用计算机视觉领域广为人知的迁移学习框架。我们首先在具有大量类别的复杂任务上预训练一个嵌入模型,随后将其迁移至七个成熟的流量分类数据集。预训练任务为加密QUIC流量中的SNI域名识别——由于TLS加密客户端问候的广泛采用,该任务本身已成为网络监控领域的挑战。我们的训练流程采用分离类别设置、ArcFace损失函数及现代深度学习架构,旨在生成跨任务通用的嵌入表示。基于模型微调的迁移方法在十项下游流量分类任务中的九项超越了当前最优性能,平均提升达6.4%。此外,通过与基于原始数据包序列的基线方法对比,我们发现了可能对流量分类领域产生广泛影响的意外结果。我们已公开发布模型架构、训练权重及用于迁移学习实验的代码库。