In recent years, the rapid rise of video applications has led to an explosion of Internet video traffic, thereby posing severe challenges to network management. Therefore, effectively identifying and managing video traffic has become an urgent problem to be solved. However, the existing video traffic feature extraction methods mainly target at the traditional packet and flow level features, and the video traffic identification accuracy is low. Additionally, the issue of high data dimension often exists in video traffic identification, requiring an effective approach to select the most relevant features to complete the identification task. Although numerous studies have used feature selection to achieve improved identification performance, no feature selection research has focused on measuring feature distributions that do not overlap or have a small overlap. First, this study proposes to extract video-related features to construct a large-scale feature set to identify video traffic. Second, to reduce the cost of video traffic identification and select an effective feature subset, the current research proposes an adaptive distribution distance-based feature selection (ADDFS) method, which uses Wasserstein distance to measure the distance between feature distributions. To test the effectiveness of the proposal, we collected a set of video traffic from different platforms in a campus network environment and conducted a set of experiments using these data sets. Experimental results suggest that the proposed method can achieve high identification performance for video scene traffic and cloud game video traffic identification. Lastly, a comparison of ADDFS with other feature selection methods shows that ADDFS is a practical feature selection technique not only for video traffic identification, but also for general classification tasks.
翻译:近年来,视频应用的快速崛起导致互联网视频流量激增,从而给网络管理带来了严峻挑战。因此,有效识别和管理视频流量已成为亟待解决的问题。然而,现有视频流量特征提取方法主要针对传统的数据包和流级特征,导致视频流量识别准确率较低。此外,视频流量识别中常存在数据维度高的问题,需要有效的方法来选择最相关的特征以完成识别任务。尽管已有大量研究利用特征选择提升了识别性能,但尚无特征选择研究聚焦于度量无重叠或重叠较小的特征分布。本研究首先提出提取视频相关特征以构建大规模特征集用于视频流量识别。其次,为降低视频流量识别成本并选择有效的特征子集,本文提出一种基于自适应分布距离的特征选择(ADDFS)方法,该方法利用Wasserstein距离度量特征分布间的距离。为验证所提方法的有效性,我们在校园网环境中从不同平台收集了一组视频流量,并基于这些数据集开展了一系列实验。实验结果表明,所提方法能够在视频场景流量和云游戏视频流量识别中取得较高的识别性能。最后,将ADDFS与其他特征选择方法进行对比后发现,ADDFS不仅适用于视频流量识别,也是一种适用于通用分类任务的实用特征选择技术。