Recent advancements in large language models (LLMs) have demonstrated exceptional success in a wide range of general domain tasks, such as question answering and following instructions. Moreover, LLMs have shown potential in various software engineering applications. In this study, we present a systematic comparison of test suites generated by the ChatGPT LLM and the state-of-the-art SBST tool EvoSuite. Our comparison is based on several critical factors, including correctness, readability, code coverage, and bug detection capability. By highlighting the strengths and weaknesses of LLMs (specifically ChatGPT) in generating unit test cases compared to EvoSuite, this work provides valuable insights into the performance of LLMs in solving software engineering problems. Overall, our findings underscore the potential of LLMs in software engineering and pave the way for further research in this area.
翻译:近年来,大语言模型在多种通用领域任务(如问答与指令遵循)中展现出卓越的成效。此外,大语言模型在各类软件工程应用中已显现潜力。本研究系统比较了ChatGPT大语言模型与当前最先进的SBST工具EvoSuite生成的测试套件。我们的比较基于若干关键要素,包括正确性、可读性、代码覆盖率及缺陷检测能力。通过揭示大语言模型(尤其是ChatGPT)相较于EvoSuite在生成单元测试用例时的优势与局限,本工作为理解大语言模型解决软件工程问题的性能提供了宝贵见解。总体而言,我们的研究结果强调了大语言模型在软件工程领域的潜力,并为该方向的后续研究奠定了基础。