Accurately classifying the application types of network traffic using deep learning models has recently gained popularity. However, we find that these classifiers do not perform well on real-world traffic data due to the presence of non-application-specific generic background traffic originating from advertisements, analytics, shared APIs, and trackers. Unfortunately, state-of-the-art application classifiers overlook such traffic in curated datasets and only classify relevant application traffic. To address this issue, when we label and train using an additional class for background traffic, it leads to additional confusion between application and background traffic, as the latter is heterogeneous and encompasses all traffic that is not relevant to the application sessions. To avoid falsely classifying background traffic as one of the relevant application types, a reliable confidence measure is warranted, such that we can refrain from classifying uncertain samples. Therefore, we design a Gaussian Mixture Model-based classification framework that improves the indication of the deep learning classifier's confidence to allow more reliable classification.
翻译:利用深度学习模型对网络流量的应用类型进行准确分类近来受到广泛关注。然而,我们发现这些分类器在真实流量数据上表现不佳,因为存在来自广告、分析工具、共享API和追踪器的非应用特定通用背景流量。遗憾的是,现有最先进的应用分类器在构建数据集时忽略了此类流量,仅对相关应用流量进行分类。为解决该问题,当我们使用额外背景流量类别进行标注和训练时,会导致应用流量与背景流量之间产生额外混淆,因为后者具有异质性且包含所有与应用会话无关的流量。为避免将背景流量误判为相关应用类型,需要可靠的置信度度量,以便能够对不确定样本暂不分类。为此,我们设计了一种基于高斯混合模型的分类框架,该框架通过改进深度学习分类器的置信度指示机制来实现更可靠的分类。