With the increasing prevalence of encrypted network traffic, cyber security analysts have been turning to machine learning (ML) techniques to elucidate the traffic on their networks. However, ML models can become stale as new traffic emerges that is outside of the distribution of the training set. In order to reliably adapt in this dynamic environment, ML models must additionally provide contextualized uncertainty quantification to their predictions, which has received little attention in the cyber security domain. Uncertainty quantification is necessary both to signal when the model is uncertain about which class to choose in its label assignment and when the traffic is not likely to belong to any pre-trained classes. We present a new, public dataset of network traffic that includes labeled, Virtual Private Network (VPN)-encrypted network traffic generated by 10 applications and corresponding to 5 application categories. We also present an ML framework that is designed to rapidly train with modest data requirements and provide both calibrated, predictive probabilities as well as an interpretable "out-of-distribution" (OOD) score to flag novel traffic samples. We describe calibrating OOD scores using p-values of the relative Mahalanobis distance. We demonstrate that our framework achieves an F1 score of 0.98 on our dataset and that it can extend to an enterprise network by testing the model: (1) on data from similar applications, (2) on dissimilar application traffic from an existing category, and (3) on application traffic from a new category. The model correctly flags uncertain traffic and, upon retraining, accurately incorporates the new data.
翻译:随着加密网络流量的日益普及,网络安全分析师开始借助机器学习(ML)技术来解析其网络中的流量。然而,当新兴流量超出训练集的分布范围时,ML模型可能变得陈旧。为在这种动态环境中实现可靠自适应,ML模型还需为其预测提供情境化的不确定性量化——这一领域在网络安全中鲜受关注。不确定性量化不仅需要在模型对标签分配中的类别选择不确定时发出预警,还需在流量可能不属于任何预训练类别时进行标识。我们提出了一个新型公开网络流量数据集,包含由10个应用生成的带标签的虚拟专用网络(VPN)加密流量,对应5个应用类别。同时,我们提出一个ML框架,该框架设计为在适度数据需求下快速训练,并提供校准后的预测概率以及可解释的"分布外"(OOD)分数,用于标记新型流量样本。我们描述了利用相对马氏距离的p值校准OOD分数的方法。实验表明,该框架在数据集上达到0.98的F1分数,并且可通过以下三种测试场景扩展至企业网络:(1)对相似应用数据;(2)对现有类别中的不同应用流量;(3)对新类别应用流量。该模型能正确标记不确定流量,并在重新训练后准确融合新数据。