ChatGPT and Human Synergy in Black-Box Testing: A Comparative Analysis

In recent years, large language models (LLMs), such as ChatGPT, have been pivotal in advancing various artificial intelligence applications, including natural language processing and software engineering. A promising yet underexplored area is utilizing LLMs in software testing, particularly in black-box testing. This paper explores the test cases devised by ChatGPT in comparison to those created by human participants. In this study, ChatGPT (GPT-4) and four participants each created black-box test cases for three applications based on specifications written by the authors. The goal was to evaluate the real-world applicability of the proposed test cases, identify potential shortcomings, and comprehend how ChatGPT could enhance human testing strategies. ChatGPT can generate test cases that generally match or slightly surpass those created by human participants in terms of test viewpoint coverage. Additionally, our experiments demonstrated that when ChatGPT cooperates with humans, it can cover considerably more test viewpoints than each can achieve alone, suggesting that collaboration between humans and ChatGPT may be more effective than human pairs working together. Nevertheless, we noticed that the test cases generated by ChatGPT have certain issues that require addressing before use.

翻译：近年来，大语言模型（LLMs）如ChatGPT在推动自然语言处理和软件工程等人工智能应用方面发挥了关键作用。一个前景广阔但尚未充分探索的领域是将LLMs应用于软件测试，尤其是黑盒测试。本文探讨了ChatGPT设计的测试用例与人类参与者设计的测试用例之间的对比。在本研究中，ChatGPT（GPT-4）和四名参与者分别基于作者编写的规范为三个应用程序创建了黑盒测试用例。目标在于评估所提出的测试用例的实际适用性，识别潜在缺陷，并理解ChatGPT如何增强人类的测试策略。ChatGPT能够生成在测试视角覆盖范围方面通常与人类参与者相当或略优的测试用例。此外，我们的实验表明，当ChatGPT与人类合作时，其能够覆盖远多于任何一方单独达到的测试视角，这表明人类与ChatGPT之间的协作可能比人类配对工作更为高效。然而，我们也注意到ChatGPT生成的测试用例存在一些使用前需解决的问题。

相关内容

CASES

关注 4

CASES：International Conference on Compilers, Architectures, and Synthesis for Embedded Systems。 Explanation：嵌入式系统编译器、体系结构和综合国际会议。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/cases/index.html

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日