Network traffic classification is of great importance for network operators in their daily routines, such as analyzing the usage patterns of multimedia applications and optimizing network configurations. Internet service providers (ISPs) that operate high-speed links expect network flow classifiers to accurately classify flows early, using the minimal number of necessary initial packets per flow. These classifiers must also be robust to packet sequence disorders in candidate flows and capable of detecting unseen flow types that are not within the existing classification scope, which are not well achieved by existing methods. In this paper, we develop FastFlow, a time-series flow classification method that accurately classifies network flows as one of the known types or the unknown type, which dynamically selects the minimal number of packets to balance accuracy and efficiency. Toward the objectives, we first develop a flow representation process that converts packet streams at both per-packet and per-slot granularity for precise packet statistics with robustness to packet sequence disorders. Second, we develop a sequential decision-based classification model that leverages LSTM architecture trained with reinforcement learning. Our model makes dynamic decisions on the minimal number of time-series data points per flow for the confident classification as one of the known flow types or an unknown one. We evaluated our method on public datasets and demonstrated its superior performance in early and accurate flow classification. Deployment insights on the classification of over 22.9 million flows across seven application types and 33 content providers in a campus network over one week are discussed, showing that FastFlow requires an average of only 8.37 packets and 0.5 seconds to classify the application type of a flow with over 91% accuracy and over 96% accuracy for the content providers.
翻译:网络流量分类对于网络运营商在日常工作中的多媒体应用使用模式分析和网络配置优化等任务至关重要。运营高速链路的互联网服务提供商期望网络流分类器能够使用每个流最少的必要初始数据包,在早期阶段实现精确分类。这些分类器还必须对候选流中的数据包序列紊乱具有鲁棒性,并能够检测现有分类范围之外的未知流类型,而现有方法未能很好地实现这些目标。本文提出FastFlow,一种时序流分类方法,能够将网络流准确分类为已知类型或未知类型,并动态选择最小数据包数以平衡准确性与效率。为实现目标,我们首先开发了一种流表示过程,该过程在每包和每时间槽粒度上转换数据包流,以获取精确的数据包统计信息,同时对数据包序列紊乱具有鲁棒性。其次,我们开发了一种基于序列决策的分类模型,该模型利用通过强化学习训练的LSTM架构。我们的模型动态决定每个流所需的最小时序数据点数量,以确信地将其分类为已知流类型或未知类型。我们在公开数据集上评估了该方法,证明了其在早期精确流分类方面的卓越性能。本文讨论了在校园网络中部署一周期间对超过2290万条流(涵盖七种应用类型和33家内容提供商)进行分类的实践洞察,结果表明FastFlow平均仅需8.37个数据包和0.5秒即可完成流分类,应用类型分类准确率超过91%,内容提供商分类准确率超过96%。