Anomaly detection techniques are essential in automating the monitoring of IT systems and operations. These techniques imply that machine learning algorithms are trained on operational data corresponding to a specific period of time and that they are continuously evaluated on newly emerging data. Operational data is constantly changing over time, which affects the performance of deployed anomaly detection models. Therefore, continuous model maintenance is required to preserve the performance of anomaly detectors over time. In this work, we analyze two different anomaly detection model maintenance techniques in terms of the model update frequency, namely blind model retraining and informed model retraining. We further investigate the effects of updating the model by retraining it on all the available data (full-history approach) and on only the newest data (sliding window approach). Moreover, we investigate whether a data change monitoring tool is capable of determining when the anomaly detection model needs to be updated through retraining.
翻译:异常检测技术在自动化IT系统与运维监控中至关重要。这些技术意味着机器学习算法基于特定时间段的运维数据进行训练,并不断对新产生的数据进行评估。运维数据随时间持续变化,这会影响已部署的异常检测模型的性能。因此,需要持续的模型维护以保持异常检测器随时间推移的性能。在本工作中,我们从模型更新频率的角度分析了两种不同的异常检测模型维护技术,即盲模型重训练和知情模型重训练。我们进一步研究了通过基于全部可用数据(全历史方法)和仅基于最新数据(滑动窗口方法)进行重训练来更新模型的效果。此外,我们还探究了数据变化监控工具是否能够判定何时需要通过对异常检测模型进行重训练来更新模型。