Accurate and efficient network traffic classification is important for many network management tasks, from traffic prioritization to anomaly detection. Although classifiers using pre-computed flow statistics (e.g., packet sizes, inter-arrival times) can be efficient, they may experience lower accuracy than techniques based on raw traffic, including packet captures. Past work on representation learning-based classifiers applied to network traffic captures has shown to be more accurate, but slower and requiring considerable additional memory resources, due to the substantial costs in feature preprocessing. In this paper, we explore this trade-off and develop the Adaptive Constraint-Driven Classification (AC-DC) framework to efficiently curate a pool of classifiers with different target requirements, aiming to provide comparable classification performance to complex packet-capture classifiers while adapting to varying network traffic load. AC-DC uses an adaptive scheduler that tracks current system memory availability and incoming traffic rates to determine the optimal classifier and batch size to maximize classification performance given memory and processing constraints. Our evaluation shows that AC-DC improves classification performance by more than 100% compared to classifiers that rely on flow statistics alone; compared to the state-of-the-art packet-capture classifiers, AC-DC achieves comparable performance (less than 12.3% lower in F1-Score), but processes traffic over 150x faster.
翻译:准确高效的网络流量分类对于许多网络管理任务(从流量优先级划分到异常检测)至关重要。尽管基于预计算流统计量(如数据包大小、到达间隔时间)的分类器可能较为高效,但其准确率往往低于基于原始流量(包括数据包捕获)的技术。以往将基于表示学习的分类器应用于网络流量捕获的研究虽展现出更高准确率,但由于特征预处理成本高昂,其速度较慢且需要大量额外内存资源。本文探究这种权衡关系,提出自适应约束驱动分类(AC-DC)框架,能够高效地筛选出具有不同目标需求的分类器池,旨在提供与复杂数据包捕获分类器相当的分类性能,同时适应变化的网络流量负载。AC-DC采用自适应调度器,通过跟踪当前系统内存可用性和到达流量速率,在内存和处理约束下确定最优分类器及批处理大小,以最大化分类性能。评估表明,与仅依赖流统计量的分类器相比,AC-DC将分类性能提升超过100%;与最先进的数据包捕获分类器相比,AC-DC达到了可比的性能(F1分数仅低12.3%),但处理速度快150倍以上。