Recently, Large Language Model (LLM)-based Fault Localization (FL) techniques have been proposed, and showed improved performance with explanations on FL results. However, a major issue with LLM-based FL techniques is their heavy reliance on LLMs, which are often unreliable, expensive, and difficult to analyze or improve. When results are unsatisfactory, it is challenging both to determine a cause and to refine a technique for better outcomes. To address this issue, we propose LogicFL, a novel logical fault localization technique for Null Pointer Exceptions (NPEs). With logic programming, LogicFL imitates human developers' deduction process of fault localization, and identifies causes of NPEs after logical inferences on collected facts about faulty code and test execution. In an empirical evaluation of 76 NPE bugs from Apache Commons projects and the Defects4J benchmark, LogicFL accurately identified the fault locations and pinpointed the exact code fragments causing the NPEs for 67 bugs (88.16%), which were 19.64% and 4.69% more bugs than two compared LLM-based FL techniques respectively. In addition, LogicFL can be executed on a low-performance machine similar to a typical laptop, with an average runtime of 21.63 seconds and a worst-case time of under two minutes, including test execution and output file generation. Moreover, when compared to the two LLM-based FL techniques using the GPT-4o model, LogicFL was significantly more cost-efficient, as those techniques required 343.94 and 3,736.19 times the cost of LogicFL, respectively. Last but not least, the deduction process in LogicFL for providing FL results is fully traceable, enabling us to understand the reasoning behind the technique's outcomes and to further enhance the technique.
翻译:近年来,基于大型语言模型(LLM)的故障定位(FL)技术被提出,并通过提供对FL结果的解释展现了性能提升。然而,基于LLM的FL技术存在一个主要问题,即其严重依赖LLM,而LLM通常不可靠、成本高昂且难以分析或改进。当结果不理想时,既难以确定原因,也难以改进技术以获得更好的结果。为解决此问题,我们提出了LogicFL,一种针对空指针异常(NPE)的新型逻辑故障定位技术。LogicFL利用逻辑编程,模仿人类开发者在故障定位中的演绎过程,通过对收集到的故障代码和测试执行事实进行逻辑推理来识别NPE的原因。在对来自Apache Commons项目和Defects4J基准测试的76个NPE错误进行的实证评估中,LogicFL准确识别了故障位置,并为其中67个错误(88.16%)精确定位了导致NPE的确切代码片段,这比两种对比的基于LLM的FL技术分别多识别了19.64%和4.69%的错误。此外,LogicFL可以在类似于典型笔记本电脑的低性能机器上执行,平均运行时间为21.63秒,最坏情况时间不超过两分钟,包括测试执行和输出文件生成。而且,与使用GPT-4o模型的两种基于LLM的FL技术相比,LogicFL的成本效益显著更高,因为这两种技术分别需要LogicFL成本的343.94倍和3,736.19倍。最后同样重要的是,LogicFL中提供FL结果的演绎过程是完全可追溯的,这使我们能够理解该技术结果背后的推理逻辑,并进一步改进该技术。