There is an increasing interest in the development of new data-driven models useful to assess the performance of communication networks. For many applications, like network monitoring and troubleshooting, a data model is of little use if it cannot be interpreted by a human operator. In this paper, we present an extension of the Multivariate Big Data Analysis (MBDA) methodology, a recently proposed interpretable data analysis tool. In this extension, we propose a solution to the automatic derivation of features, a cornerstone step for the application of MBDA when the amount of data is massive. The resulting network monitoring approach allows us to detect and diagnose disparate network anomalies, with a data-analysis workflow that combines the advantages of interpretable and interactive models with the power of parallel processing. We apply the extended MBDA to two case studies: UGR'16, a benchmark flow-based real-traffic dataset for anomaly detection, and Dartmouth'18, the longest and largest Wi-Fi trace known to date.
翻译:随着通信网络性能评估需求的增长,开发新型数据驱动模型日益受到关注。在网络监控与故障排除等应用中,若数据模型无法被人类操作员理解,其实际效用将大打折扣。本文提出多变量大数据分析(Multivariate Big Data Analysis, MBDA)方法的扩展方案——该方法是近期提出的可解释性数据分析工具。在此扩展中,我们提出了一种特征自动推导解决方案,这是在海量数据场景下应用MBDA的关键步骤。由此形成的网络监控方法能够检测并诊断各类网络异常,其数据分析流程融合了可解释交互模型与并行处理技术的优势。我们将扩展后的MBDA应用于两个案例研究:UGR'16(用于异常检测的基准流级真实流量数据集)与Dartmouth'18(迄今为止时间跨度最长、规模最大的Wi-Fi轨迹数据集)。