Internet-based services have seen remarkable success, generating vast amounts of monitored key performance indicators (KPIs) as univariate or multivariate time series. Monitoring and analyzing these time series are crucial for researchers, service operators, and on-call engineers to detect outliers or anomalies indicating service failures or significant events. Numerous advanced anomaly detection methods have emerged to address availability and performance issues. This review offers a comprehensive overview of time series anomaly detection in Artificial Intelligence for IT operations (AIOps), which uses AI capabilities to automate and optimize operational workflows. Additionally, it explores future directions for real-world and next-generation time-series anomaly detection based on recent advancements.
翻译:基于互联网的服务取得了显著成功,生成了大量以单变量或多变量时间序列形式呈现的关键性能指标(KPI)。对这些时间序列进行监控和分析,对于研究人员、服务运维人员以及值班工程师检测指示服务故障或重大事件的异常值或异常行为至关重要。为应对可用性和性能问题,众多先进的异常检测方法应运而生。本文综述全面概述了面向IT运维的人工智能(AIOps)领域中的时间序列异常检测,该领域利用AI能力自动化和优化运维工作流程。此外,基于近期进展,本文还探讨了面向实际应用和下一代时间序列异常检测的未来方向。