Large Language Models (LLMs) have shown a surprising level of performance on multiple software engineering problems. However, they have not yet been applied to fault localization (FL), in which one must find the code element responsible for a bug from a potentially vast software repository. Nonetheless, LLM application to FL has the potential to benefit developers both in terms of performance and explainability. In this work, we present AutoFL, an automated fault localization technique that only requires a single failing test, and in its fault localization process generates an explanation about why the given test fails. Using the function call API of the ChatGPT large language model, we provide tools that allow it to explore a large source code repository, which would otherwise pose a significant challenge as it would be impossible to fit all the source code within the limited prompt length. Our results indicate that on the widely used Defects4J benchmark, AutoFL could identify the faulty method on the first try more often than all standalone techniques we compared against from prior work. Nonetheless, there is ample room to improve performance, and we encourage the further experimentation of language model-based fault localization as a promising research area.
翻译:大语言模型(LLMs)已展现出在多个软件工程问题上的出色表现。然而,它们尚未被应用于故障定位(FL)——即从潜在庞大的软件仓库中找出引发错误的代码元素。尽管如此,将大语言模型应用于故障定位有望在性能和可解释性两方面为开发者带来助益。本研究提出AutoFL,一种仅需单个失败测试用例的自动化故障定位技术,其定位过程中会生成关于测试失败原因的解释。我们利用ChatGPT大语言模型的函数调用接口,提供可探索大规模源代码仓库的工具——这原本面临重大挑战,因为将全部源代码容纳进有限的提示长度中是不可能的。结果表明,在广泛使用的Defects4J基准测试集上,AutoFL首次尝试即可定位故障方法的频率高于我们对比的所有先前工作中的独立技术。然而,性能仍存在大幅提升空间,我们鼓励将基于语言模型的故障定位作为极具前景的研究方向进行进一步实验。