A Supervised Embedding and Clustering Anomaly Detection method for classification of Mobile Network Faults

The paper introduces Supervised Embedding and Clustering Anomaly Detection (SEMC-AD), a method designed to efficiently identify faulty alarm logs in a mobile network and alleviate the challenges of manual monitoring caused by the growing volume of alarm logs. SEMC-AD employs a supervised embedding approach based on deep neural networks, utilizing historical alarm logs and their labels to extract numerical representations for each log, effectively addressing the issue of imbalanced classification due to a small proportion of anomalies in the dataset without employing one-hot encoding. The robustness of the embedding is evaluated by plotting the two most significant principle components of the embedded alarm logs, revealing that anomalies form distinct clusters with similar embeddings. Multivariate normal Gaussian clustering is then applied to these components, identifying clusters with a high ratio of anomalies to normal alarms (above 90%) and labeling them as the anomaly group. To classify new alarm logs, we check if their embedded vectors' two most significant principle components fall within the anomaly-labeled clusters. If so, the log is classified as an anomaly. Performance evaluation demonstrates that SEMC-AD outperforms conventional random forest and gradient boosting methods without embedding. SEMC-AD achieves 99% anomaly detection, whereas random forest and XGBoost only detect 86% and 81% of anomalies, respectively. While supervised classification methods may excel in labeled datasets, the results demonstrate that SEMC-AD is more efficient in classifying anomalies in datasets with numerous categorical features, significantly enhancing anomaly detection, reducing operator burden, and improving network maintenance.

翻译：本文提出了一种监督嵌入与聚类异常检测方法（SEMC-AD），旨在高效识别移动网络中的故障告警日志，缓解因告警日志数量持续增长而带来的人工监控挑战。SEMC-AD采用基于深度神经网络的监督嵌入方法，利用历史告警日志及其标签为每条日志提取数值表征，有效解决了因数据集中异常样本占比过小导致的分类不平衡问题，且无需采用独热编码。通过绘制嵌入告警日志的两个最主要主成分来评估嵌入的鲁棒性，结果表明异常日志会形成具有相似嵌入的独立聚类。随后对上述主成分应用多元正态高斯聚类，识别出异常与正常告警比率超过90%的聚类，并将其标记为异常组。为分类新的告警日志，我们检查其嵌入向量的两个最主要主成分是否落入异常标注聚类内：若落入，则将该日志判定为异常。性能评估表明，SEMC-AD优于未使用嵌入的传统随机森林和梯度提升方法。SEMC-AD的异常检测率达99%，而随机森林和XGBoost仅能分别检测出86%和81%的异常。尽管监督分类方法在标注数据集上表现优异，但实验结果证明，SEMC-AD在处理具有大量类别特征的数据集时能更高效地识别异常，显著提升异常检测能力，降低运维人员负担，并改善网络维护效果。