Anomaly detection techniques are essential in automating the monitoring of IT systems and operations. These techniques imply that machine learning algorithms are trained on operational data corresponding to a specific period of time and that they are continuously evaluated on newly emerging data. Operational data is constantly changing over time, which affects the performance of deployed anomaly detection models. Therefore, continuous model maintenance is required to preserve the performance of anomaly detectors over time. In this work, we analyze two different anomaly detection model maintenance techniques in terms of the model update frequency, namely blind model retraining and informed model retraining. We further investigate the effects of updating the model by retraining it on all the available data (full-history approach) and only the newest data (sliding window approach). Moreover, we investigate whether a data change monitoring tool is capable of determining when the anomaly detection model needs to be updated through retraining.
翻译:异常检测技术在自动化IT系统与运维监控中至关重要。这些技术意味着机器学习算法需基于特定时间段内的运维数据进行训练,并持续对新兴数据流进行实时评估。运维数据随时间持续演变,这将影响已部署异常检测模型的性能表现。因此,需要持续进行模型维护以保障异常检测器的时间效能。本研究从模型更新频率维度分析两种不同的异常检测模型维护技术,即盲式模型重训练与信息型模型重训练。我们进一步探究通过全历史数据重训练(全量历史方法)与仅使用最新数据重训练(滑动窗口方法)更新模型的效果差异。此外,本研究还验证了数据变更监控工具是否能够判定异常检测模型何时需要执行重训练更新。