Log analysis and monitoring are essential aspects in software maintenance and identifying defects. In particular, the temporal nature and vast size of log data leads to an interesting and important research question: How can logs be summarised and monitored over time? While this has been a fundamental topic of research in the software engineering community, work has typically focused on heuristic-, syntax-, or static-based methods. In this work, we suggest an online semantic-based clustering approach to error logs that dynamically updates the log clusters to enable monitoring code error life-cycles. We also introduce a novel metric to evaluate the performance of temporal log clusters. We test our system and evaluation metric with an industrial dataset and find that our solution outperforms similar systems. We hope that our work encourages further temporal exploration in defect datasets.
翻译:日志分析与监控是软件维护及缺陷识别中的关键环节。尤其是日志数据的时间特性和庞大规模引出了一个重要且有趣的研究问题:如何随时间推移对日志进行摘要与监控?尽管这已成为软件工程领域的基础研究课题,现有工作通常基于启发式方法、语法分析或静态方法。本文提出一种面向错误日志的在线语义聚类方法,该方法能动态更新日志簇以实现对代码错误生命周期的监控。我们还引入了一种新型评估指标,用于衡量时序日志聚类的性能。通过工业数据集对系统与评估指标进行测试,结果表明我们的解决方案优于同类系统。希望本研究能推动对缺陷数据集的进一步时间维度探索。