There is an increasing interest in the development of new data-driven models useful to assess the performance of communication networks. For many applications, like network monitoring and troubleshooting, a data model is of little use if it cannot be interpreted by a human operator. In this paper, we present an extension of the Multivariate Big Data Analysis (MBDA) methodology, a recently proposed interpretable data analysis tool. In this extension, we propose a solution to the automatic derivation of features, a cornerstone step for the application of MBDA when the amount of data is massive. The resulting network monitoring approach allows us to detect and diagnose disparate network anomalies, with a data-analysis workflow that combines the advantages of interpretable and interactive models with the power of parallel processing. We apply the extended MBDA to two case studies: UGR'16, a benchmark flow-based real-traffic dataset for anomaly detection, and Dartmouth'18, the longest and largest Wi-Fi trace known to date.
翻译:随着开发用于评估通信网络性能的新型数据驱动模型的需求日益增长,在许多应用场景(如网络监控与故障排除)中,若数据模型无法被人类操作员解读,则其价值极为有限。本文提出了一种多变量大数据分析方法的扩展版本——该方法是近期提出的可解释数据分析工具。在此扩展中,我们提出了一种自动特征提取方案,这是在大规模数据场景下应用MBDA的关键步骤。由此产生的网络监控方法能够检测并诊断各类异常网络行为,其数据分析流程融合了可解释与交互式模型优势及并行处理能力。我们将扩展后的MBDA应用于两个案例研究:用于异常检测的基准流量级真实数据集UGR'16,以及迄今已知时间跨度最长、体量最大的Wi-Fi追踪数据集Dartmouth'18。