This paper introduces LogLead, a tool designed for efficient log analysis benchmarking. LogLead combines three essential steps in log processing: loading, enhancing, and anomaly detection. The tool leverages Polars, a high-speed DataFrame library. We currently have Loaders for eight systems that are publicly available (HDFS, Hadoop, BGL, Thunderbird, Spirit, Liberty, TrainTicket, and GC Webshop). We have multiple enhancers with three parsers (Drain, Spell, LenMa), Bert embedding creation and other log representation techniques like bag-of-words. LogLead integrates to five supervised and four unsupervised machine learning algorithms for anomaly detection from SKLearn. By integrating diverse datasets, log representation methods and anomaly detectors, LogLead facilitates comprehensive benchmarking in log analysis research. We show that log loading from raw file to dataframe is over 10x faster with LogLead compared to past solutions. We demonstrate roughly 2x improvement in Drain parsing speed by off-loading log message normalization to LogLead. Our brief benchmarking on HDFS indicates that log representations extending beyond the bag-of-words approach offer limited additional benefits. Tool URL: https://github.com/EvoTestOps/LogLead
翻译:本文介绍了LogLead,一款专为高效日志分析基准测试设计的工具。LogLead整合了日志处理中的三个关键步骤:加载、增强与异常检测。该工具依托高性能DataFrame库Polars实现。目前,我们为八个公开可用的系统(HDFS、Hadoop、BGL、Thunderbird、Spirit、Liberty、TrainTicket及GC Webshop)提供了加载器。我们拥有多种增强器,包括三个解析器(Drain、Spell、LenMa)、Bert嵌入生成及其他日志表示技术(如词袋模型)。LogLead集成了SKLearn中的五种监督学习和四种无监督机器学习算法用于异常检测。通过整合多样化数据集、日志表示方法与异常检测器,LogLead促进了日志分析研究中的全面基准测试。实验表明,与以往解决方案相比,LogLead从原始文件加载到数据框的速度提升超过10倍。通过将日志消息规范化功能转移至LogLead,我们实现了Drain解析速度约2倍的提升。基于HDFS的简要基准测试表明,超出词袋模型的日志表示方法所提供的额外收益有限。工具网址:https://github.com/EvoTestOps/LogLead