With 95% of Internet traffic now encrypted, an effective approach to classifying this traffic is crucial for network security and management. This paper introduces ECHO -- a novel optimization process for ML/DL-based encrypted traffic classification. ECHO targets both classification time and memory utilization and incorporates two innovative techniques. The first component, HO (Hyperparameter Optimization of binnings), aims at creating efficient traffic representations. While previous research often uses representations that map packet sizes and packet arrival times to fixed-sized bins, we show that non-uniform binnings are significantly more efficient. These non-uniform binnings are derived by employing a hyperparameter optimization algorithm in the training stage. HO significantly improves accuracy given a required representation size, or, equivalently, achieves comparable accuracy using smaller representations. Then, we introduce EC (Early Classification of traffic), which enables faster classification using a cascade of classifiers adapted for different exit times, where classification is based on the level of confidence. EC reduces the average classification latency by up to 90\%. Remarkably, this method not only maintains classification accuracy but also, in certain cases, improves it. Using three publicly available datasets, we demonstrate that the combined method, Early Classification with Hyperparameter Optimization (ECHO), leads to a significant improvement in classification efficiency.
翻译:随着当前互联网流量中加密流量占比高达95%,一种有效的加密流量分类方法对于网络安全与管理至关重要。本文提出ECHO——一种面向机器学习/深度学习加密流量分类的新型优化流程。ECHO同时针对分类时延与内存利用率进行优化,并融合了两项创新技术。首个组件HO(分箱超参数优化)旨在构建高效的流量表征。尽管现有研究通常采用将数据包大小与到达时间映射至固定尺寸分箱的表示方法,但我们证明非均匀分箱方案具有显著更高的效率。这些非均匀分箱通过在训练阶段采用超参数优化算法推导得出。HO在给定表征尺寸要求下显著提升分类准确率,或等价地,能以更小的表征尺寸达到相当的准确率。随后,我们提出EC(流量早期分类)技术,通过采用适配不同退出时刻的级联分类器实现更快速分类,其分类决策基于置信度水平。EC将平均分类延迟降低最高达90%。值得注意的是,该方法不仅保持分类准确率,在某些情况下甚至能提升准确率。基于三个公开数据集,我们验证了融合方案——超参数优化早期分类(ECHO)能显著提升分类效率。