We present MERLOT, a scalable mixture-of-expert (MoE) based refinement of distilled large language model optimized for encrypted traffic classification. By applying model distillation techniques in a teacher-student paradigm, compact models derived from GPT-2-base retain high classification accuracy while minimizing computational costs. These models function as specialized experts in an MoE architecture, dynamically assigned via a gating network. Unlike generation-based methods, our approach directly classifies encrypted traffic using the final decoder token with contextual feature embedding as input. Experiments on 10 datasets show superior or competitive performance over the state-of-the-art models while significantly reducing resource demands, underscoring its effectiveness and robustness.
翻译:本文提出MERLOT,一种基于可扩展混合专家(MoE)架构的蒸馏大语言模型优化框架,专为加密流量分类任务设计。通过采用师生范式下的模型蒸馏技术,从GPT-2-base衍生的紧凑模型在保持高分类精度的同时显著降低了计算成本。这些模型作为MoE架构中的专业化专家,通过门控网络进行动态分配。与基于生成的方法不同,我们的方法以上下文特征嵌入为输入,直接利用最终解码器令牌对加密流量进行分类。在10个数据集上的实验表明,该框架在显著降低资源需求的同时,取得了优于或可比肩当前最优模型的性能,充分验证了其有效性与鲁棒性。