System logs are some of the most important information for the maintenance of software systems, which have become larger and more complex in recent years. The goal of log-based anomaly detection is to automatically detect system anomalies by analyzing the large number of logs generated in a short period of time, which is a critical challenge in the real world. Previous studies have used a log parser to extract templates from unstructured log data and detect anomalies on the basis of patterns of the template occurrences. These methods have limitations for logs with unknown templates. Furthermore, since most log anomalies are known to be point anomalies rather than contextual anomalies, detection methods based on occurrence patterns can cause unnecessary delays in detection. In this paper, we propose LogELECTRA, a new log anomaly detection model that analyzes a single line of log messages more deeply on the basis of self-supervised anomaly detection. LogELECTRA specializes in detecting log anomalies as point anomalies by applying ELECTRA, a natural language processing model, to analyze the semantics of a single line of log messages. LogELECTRA outperformed existing state-of-the-art methods in experiments on the public benchmark log datasets BGL, Sprit, and Thunderbird.
翻译:系统日志是软件系统维护中最重要的信息之一,近年来软件系统规模日益庞大且日趋复杂。基于日志的异常检测目标是通过分析短时间内生成的大量日志来自动检测系统异常,这是现实世界中的一项关键挑战。以往研究使用日志解析器从非结构化日志数据中提取模板,并依据模板出现模式检测异常。这些方法对包含未知模板的日志存在局限性。此外,由于大多数日志异常已知为点异常而非上下文异常,基于出现模式的检测方法会导致不必要的检测延迟。本文提出一种新的日志异常检测模型LogELECTRA,该模型基于自监督异常检测对单行日志消息进行更深入分析。LogELECTRA通过应用自然语言处理模型ELECTRA分析单行日志消息的语义,专长于将日志异常检测为点异常。在公开基准日志数据集BGL、Sprit和Thunderbird上的实验中,LogELECTRA性能优于现有最优方法。