Bugs are notoriously challenging: they slow down software users and result in time-consuming investigations for developers. These challenges are exacerbated when bugs must be reported in natural language by users. Indeed, we lack reliable tools to automatically address reported bugs (i.e., enabling their analysis, reproduction, and bug fixing). With the recent promises created by LLMs such as ChatGPT for various tasks, including in software engineering, we ask ourselves: What if ChatGPT could understand bug reports and reproduce them? This question will be the main focus of this study. To evaluate whether ChatGPT is capable of catching the semantics of bug reports, we used the popular Defects4J benchmark with its bug reports. Our study has shown that ChatGPT was able to demystify and reproduce 50% of the reported bugs. ChatGPT being able to automatically address half of the reported bugs shows promising potential in the direction of applying machine learning to address bugs with only a human-in-the-loop to report the bug.
翻译:Bug问题以棘手著称:它们拖慢软件用户的速度,并导致开发者耗时调查。当用户必须以自然语言报告Bug时,这些挑战会进一步加剧。实际上,我们缺乏能够自动处理报告的Bug(即实现其分析、复现和修复)的可靠工具。随着以ChatGPT为代表的大语言模型在包括软件工程在内的各项任务中展现出最新突破,我们不禁自问:如果ChatGPT能理解Bug报告并将其复现,结果会怎样?这一问题将成为本研究的核心焦点。为评估ChatGPT能否捕捉Bug报告的语义,我们采用了流行的Defects4J基准测试及其相应的Bug报告。研究表明,ChatGPT能够解密并复现50%的已报告Bug。ChatGPT能自动处理半数报告的Bug,这展现了在仅需人类参与报告Bug的场景下,利用机器学习解决Bug问题的巨大潜力。