In recent years, large language models (LLMs), such as ChatGPT, have been pivotal in advancing various artificial intelligence applications, including natural language processing and software engineering. A promising yet underexplored area is utilizing LLMs in software testing, particularly in black-box testing. This paper explores the test cases devised by ChatGPT in comparison to those created by human participants. In this study, ChatGPT (GPT-4) and four participants each created black-box test cases for three applications based on specifications written by the authors. The goal was to evaluate the real-world applicability of the proposed test cases, identify potential shortcomings, and comprehend how ChatGPT could enhance human testing strategies. ChatGPT can generate test cases that generally match or slightly surpass those created by human participants in terms of test viewpoint coverage. Additionally, our experiments demonstrated that when ChatGPT cooperates with humans, it can cover considerably more test viewpoints than each can achieve alone, suggesting that collaboration between humans and ChatGPT may be more effective than human pairs working together. Nevertheless, we noticed that the test cases generated by ChatGPT have certain issues that require addressing before use.
翻译:近年来,大语言模型(LLMs)如ChatGPT在推动自然语言处理和软件工程等人工智能应用方面发挥了关键作用。一个前景广阔但尚未充分探索的领域是将LLMs应用于软件测试,尤其是黑盒测试。本文探讨了ChatGPT设计的测试用例与人类参与者设计的测试用例之间的对比。在本研究中,ChatGPT(GPT-4)和四名参与者分别基于作者编写的规范为三个应用程序创建了黑盒测试用例。目标在于评估所提出的测试用例的实际适用性,识别潜在缺陷,并理解ChatGPT如何增强人类的测试策略。ChatGPT能够生成在测试视角覆盖范围方面通常与人类参与者相当或略优的测试用例。此外,我们的实验表明,当ChatGPT与人类合作时,其能够覆盖远多于任何一方单独达到的测试视角,这表明人类与ChatGPT之间的协作可能比人类配对工作更为高效。然而,我们也注意到ChatGPT生成的测试用例存在一些使用前需解决的问题。