As Large-Scale Cloud Systems (LCS) become increasingly complex, effective anomaly detection is critical for ensuring system reliability and performance. However, there is a shortage of large-scale, real-world datasets available for benchmarking anomaly detection methods. To address this gap, we introduce a new high-dimensional dataset from IBM Cloud, collected over 4.5 months from the IBM Cloud Console. This dataset comprises 39,365 rows and 117,448 columns of telemetry data. Additionally, we demonstrate the application of machine learning models for anomaly detection and discuss the key challenges faced in this process. This study and the accompanying dataset provide a resource for researchers and practitioners in cloud system monitoring. It facilitates more efficient testing of anomaly detection methods in real-world data, helping to advance the development of robust solutions to maintain the health and performance of large-scale cloud infrastructures.
翻译:随着大规模云系统(LCS)日益复杂,有效的异常检测对确保系统可靠性与性能至关重要。然而,当前缺乏可用于异常检测方法基准测试的大规模真实数据集。为填补这一空白,我们引入了来自IBM Cloud的新高维数据集,该数据集采集自IBM Cloud控制台,时间跨度超过4.5个月。该数据集包含39,365行与117,448列的遥测数据。此外,我们展示了机器学习模型在异常检测中的应用,并探讨了此过程中的关键挑战。本研究及配套数据集为云系统监控领域的研究者与实践者提供了资源,有助于在真实数据中更高效地测试异常检测方法,从而推动维护大规模云基础设施健康与性能的稳健解决方案的发展。