Anomaly detection in network traffic is crucial for maintaining the security of computer networks and identifying malicious activities. One of the primary approaches to anomaly detection are methods based on forecasting. Nevertheless, extensive real-world network datasets for forecasting and anomaly detection techniques are missing, potentially causing performance overestimation of anomaly detection algorithms. This manuscript addresses this gap by introducing a dataset comprising time series data of network entities' behavior, collected from the CESNET3 network. The dataset was created from 40 weeks of network traffic of 275 thousand active IP addresses. The ISP origin of the presented data ensures a high level of variability among network entities, which forms a unique and authentic challenge for forecasting and anomaly detection models. It provides valuable insights into the practical deployment of forecast-based anomaly detection approaches.
翻译:网络流量异常检测对于维护计算机网络安全和识别恶意活动至关重要。基于预测的方法是异常检测的主要途径之一。然而,目前缺乏用于预测和异常检测技术的大规模真实网络数据集,这可能导致异常检测算法的性能被高估。本文通过引入一个包含网络实体行为时间序列的数据集来填补这一空白,该数据集收集自CESNET3网络。该数据集源自27.5万个活跃IP地址的40周网络流量。所提供数据的互联网服务提供商来源确保了网络实体间的高度可变性,这为预测和异常检测模型构成了独特而真实的挑战。该数据集为基于预测的异常检测方法在实际部署中的应用提供了有价值的见解。