This paper introduces LogLead, a tool designed for efficient log analysis. LogLead combines three essential steps in log processing: loading, enhancing, and anomaly detection. The tool leverages Polars, a high-speed DataFrame library. We currently have 7 Loaders out of which 4 is for public data sets (HDFS, Hadoop, BGL, and Thunderbird). We have multiple enhancers with three parsers (Drain, Spell, LenMa), Bert embedding creation and other log representation techniques like bag-of-words. LogLead integrates to 5 supervised and 4 unsupervised machine learning algorithms for anomaly detection from SKLearn. By integrating diverse datasets, log representation methods and anomaly detectors, LogLead facilitates comprehensive benchmarking in log analysis research. We demonstrate that log loading from raw file to dataframe is over 10x faster with LogLead is compared to past solutions. We demonstrate roughly 2x improvement in Drain parsing speed by off-loading log message normalization to LogLead. We demonstrate a brief benchmarking on HDFS suggesting that log representations beyond bag-of-words provide limited benefits. Screencast demonstrating the tool: https://youtu.be/8stdbtTfJVo
翻译:本文介绍了LogLead,一款专为高效日志分析设计的工具。LogLead整合了日志处理的三个关键步骤:加载、增强和异常检测。该工具利用高性能DataFrame库Polars实现。目前LogLead包含7个加载器,其中4个用于公开数据集(HDFS、Hadoop、BGL和Thunderbird)。我们拥有多种增强器,包括三个解析器(Drain、Spell、LenMa)、Bert嵌入生成以及其他日志表示技术(如词袋模型)。LogLead集成了来自SKLearn的5种监督式与4种无监督式机器学习算法用于异常检测。通过整合多样化的数据集、日志表示方法及异常检测器,LogLead促进了日志分析研究中的全面基准测试。实验表明,与以往方案相比,使用LogLead从原始文件加载日志到DataFrame的速度提升超过10倍。通过将日志消息归一化任务转移至LogLead,Drain解析速度实现了约2倍的提升。基于HDFS的简要基准测试显示,词袋模型以外的日志表示方法带来的收益有限。工具演示视频:https://youtu.be/8stdbtTfJVo