Large Language Models (LLMs) have shown a surprising level of performance on multiple software engineering problems. However, they have not yet been applied to Fault Localization (FL), in which one must find the code element responsible for a bug from a potentially vast codebase. Nonetheless, LLM application to FL has the potential to benefit developers both in terms of performance and explainability. In this work, we present AutoFL, an automated fault localization technique that only requires a single failing test, and in its fault localization process generates an explanation about why the given test fails. Using the function call API of the OpenAI LLM, ChatGPT, we provide tools that allow it to explore a large source code repository, which would otherwise pose a significant challenge as it would be impossible to fit all the source code within the limited prompt length. Our results indicate that, on the widely used Defects4J benchmark, AutoFL can identify the faulty method on the first try more often than all standalone techniques we compared against from prior work. Nonetheless, there is ample room to improve performance, and we encourage further experimentation of language model-based FL as a promising research area.
翻译:大型语言模型在多个软件工程问题上展现出惊人的性能水平。然而,它们尚未被应用于故障定位任务——该任务需要从潜在庞大的代码库中定位引发缺陷的代码元素。尽管如此,将大型语言模型应用于故障定位有望在性能和可解释性两方面为开发者带来助益。本研究提出AutoFL,一种全自动故障定位技术,仅需单个失败测试用例,即可在定位过程中生成关于测试失败原因的解释。我们借助OpenAI大型语言模型ChatGPT的函数调用API,为其提供可探索大型源代码仓库的工具——若仅依赖有限提示长度,这将是一项重大挑战(因为无法容纳全部源代码)。实验结果表明,在广泛使用的Defects4J基准测试中,AutoFL首次尝试即能定位错误方法的成功率,优于我们对比的所有已有独立技术。尽管如此,其性能仍有显著提升空间,我们鼓励将基于语言模型的故障定位作为具有前景的研究方向进行更深入的探索。