Information technology (IT) systems are vital for modern businesses, handling data storage, communication, and process automation. Monitoring these systems is crucial for their proper functioning and efficiency, as it allows collecting extensive observational time series data for analysis. The interest in causal discovery is growing in IT monitoring systems as knowing causal relations between different components of the IT system helps in reducing downtime, enhancing system performance and identifying root causes of anomalies and incidents. It also allows proactive prediction of future issues through historical data analysis. Despite its potential benefits, applying causal discovery algorithms on IT monitoring data poses challenges, due to the complexity of the data. For instance, IT monitoring data often contains misaligned time series, sleeping time series, timestamp errors and missing values. This paper presents case studies on applying causal discovery algorithms to different IT monitoring datasets, highlighting benefits and ongoing challenges.
翻译:信息技术系统对于现代企业至关重要,负责处理数据存储、通信和流程自动化。对这些系统的监控对于确保其正常运行和效率至关重要,因为它能够收集大量观测性时间序列数据以供分析。在IT监控系统中,因果发现越来越受到关注,因为了解IT系统不同组件之间的因果关系有助于减少停机时间、提升系统性能,并识别异常事件的根本原因。同时,通过历史数据分析,它还能主动预测未来可能发生的问题。尽管具有潜在优势,但由于IT监控数据的复杂性,将因果发现算法应用于此类数据仍面临挑战。例如,IT监控数据通常包含时间序列不对齐、静止时间序列、时间戳错误和缺失值等问题。本文通过案例研究,将因果发现算法应用于不同的IT监控数据集,展示了其优势及当前存在的挑战。