The adoption of modern encryption protocols such as TLS 1.3 has significantly challenged traditional network traffic classification (NTC) methods. As a consequence, researchers are increasingly turning to machine learning (ML) approaches to overcome these obstacles. In this paper, we comprehensively analyze ML-based NTC studies, developing a taxonomy of their design choices, benchmarking suites, and prevalent assumptions impacting classifier performance. Through this systematization, we demonstrate widespread reliance on outdated datasets, oversights in design choices, and the consequences of unsubstantiated assumptions. Our evaluation reveals that the majority of proposed encrypted traffic classifiers have mistakenly utilized unencrypted traffic due to the use of legacy datasets. Furthermore, by conducting 348 feature occlusion experiments on state-of-the-art classifiers, we show how oversights in NTC design choices lead to overfitting, and validate or refute prevailing assumptions with empirical evidence. By highlighting lessons learned, we offer strategic insights, identify emerging research directions, and recommend best practices to support the development of real-world applicable NTC methodologies.
翻译:现代加密协议(如TLS 1.3)的广泛采用对传统网络流量分类方法构成了显著挑战。因此,研究者日益转向机器学习方法以克服这些障碍。本文系统分析了基于机器学习的网络流量分类研究,构建了涵盖其设计选择、基准测试套件及影响分类器性能的常见假设的分类体系。通过这种系统化梳理,我们揭示了该领域普遍存在的对过时数据集的依赖、设计选择中的疏漏以及未经证实的假设所导致的后果。评估结果表明,由于使用遗留数据集,大多数已提出的加密流量分类器实际上误用了未加密流量进行训练。此外,通过对前沿分类器进行348次特征遮蔽实验,我们展示了网络流量分类设计疏漏如何导致过拟合,并用实证证据验证或反驳了当前流行假设。通过总结经验教训,本文提出战略性见解,指明新兴研究方向,并推荐最佳实践以支持开发具有实际应用价值的网络流量分类方法。