Anomaly Detection in multivariate time series is a major problem in many fields. Due to their nature, anomalies sparsely occur in real data, thus making the task of anomaly detection a challenging problem for classification algorithms to solve. Methods that are based on Deep Neural Networks such as LSTM, Autoencoders, Convolutional Autoencoders etc., have shown positive results in such imbalanced data. However, the major challenge that algorithms face when applied to multivariate time series is that the anomaly can arise from a small subset of the feature set. To boost the performance of these base models, we propose a feature-bagging technique that considers only a subset of features at a time, and we further apply a transformation that is based on nested rotation computed from Principal Component Analysis (PCA) to improve the effectiveness and generalization of the approach. To further enhance the prediction performance, we propose an ensemble technique that combines multiple base models toward the final decision. In addition, a semi-supervised approach using a Logistic Regressor to combine the base models' outputs is proposed. The proposed methodology is applied to the Skoltech Anomaly Benchmark (SKAB) dataset, which contains time series data related to the flow of water in a closed circuit, and the experimental results show that the proposed ensemble technique outperforms the basic algorithms. More specifically, the performance improvement in terms of anomaly detection accuracy reaches 2% for the unsupervised and at least 10% for the semi-supervised models.
翻译:多元时间序列中的异常检测是多个领域中的主要问题。由于异常数据在真实数据中稀疏出现,使得异常检测任务成为分类算法需要解决的挑战性难题。基于深度神经网络的方法(如LSTM、自编码器、卷积自编码器等)在此类不平衡数据中已展现出积极成果。然而,当算法应用于多元时间序列时面临的主要挑战在于:异常可能仅源于特征集的某个小子集。为提升这些基础模型的性能,我们提出一种特征装袋技术,每次仅考虑特征子集,并进一步应用基于主成分分析(PCA)计算的嵌套旋转变换,以提高方法的有效性和泛化能力。为增强预测性能,我们提出一种集成技术,将多个基础模型的输出合并为最终决策。此外,还提出了一种半监督方法,采用逻辑回归器整合各基础模型的输出。该方法被应用于包含闭环水流时间序列数据的Skoltech异常基准(SKAB)数据集。实验结果表明,所提出的集成技术优于基础算法。具体而言,无监督模型的异常检测准确率提升达2%,半监督模型至少提升10%。