Configurable software systems are prone to configuration errors, resulting in significant losses to companies. However, diagnosing these errors is challenging due to the vast and complex configuration space. These errors pose significant challenges for both experienced maintainers and new end-users, particularly those without access to the source code of the software systems. Given that logs are easily accessible to most end-users, we conduct a preliminary study to outline the challenges and opportunities of utilizing logs in localizing configuration errors. Based on the insights gained from the preliminary study, we propose an LLM-based two-stage strategy for end-users to localize the root-cause configuration properties based on logs. We further implement a tool, LogConfigLocalizer, aligned with the design of the aforementioned strategy, hoping to assist end-users in coping with configuration errors through log analysis. To the best of our knowledge, this is the first work to localize the root-cause configuration properties for end-users based on Large Language Models~(LLMs) and logs. We evaluate the proposed strategy on Hadoop by LogConfigLocalizer and prove its efficiency with an average accuracy as high as 99.91%. Additionally, we also demonstrate the effectiveness and necessity of different phases of the methodology by comparing it with two other variants and a baseline tool. Moreover, we validate the proposed methodology through a practical case study to demonstrate its effectiveness and feasibility.
翻译:可配置软件系统容易出现配置错误,给企业造成重大损失。然而,由于配置空间庞大且复杂,诊断这些错误极具挑战性。这类错误对经验丰富的维护者和新用户(尤其是无法访问软件系统源代码的用户)均构成重大困难。鉴于大多数用户易于获取日志,我们开展了一项预研究,概述了利用日志定位配置错误的挑战与机遇。基于预研究的洞察,我们提出了一种基于大语言模型的两阶段策略,帮助最终用户通过日志定位根本原因配置属性。我们进一步开发了工具LogConfigLocalizer,该工具遵循上述策略设计,旨在通过日志分析协助用户应对配置错误。据我们所知,这是首个基于大语言模型和日志为最终用户定位根本原因配置属性的研究工作。我们在Hadoop上通过LogConfigLocalizer评估了所提策略,并证明其高效性,平均准确率高达99.91%。此外,通过与两种变体及基线工具的对比,我们验证了方法中各阶段的有效性与必要性。最后,我们通过实际案例研究验证了所提方法的有效性与可行性。