The integration of Large Language Models (LLMs) and chatbots introduces new challenges and opportunities for decision-making in software testing. Decision-making relies on a variety of information, including code, requirements specifications, and other software artifacts that are often unclear or exist solely in the developer's mind. To fill in the gaps left by unclear information, we often rely on assumptions, intuition, or previous experiences to make decisions. This paper explores the potential of LLM-based chatbots like Bard, Copilot, and ChatGPT, to support software testers in test decisions such as prioritizing test cases effectively. We investigate whether LLM-based chatbots and human testers share similar "assumptions" or intuition in prohibitive testing scenarios where exhaustive execution of test cases is often impractical. Preliminary results from a survey of 127 testers indicate a preference for diverse test scenarios, with a significant majority (96%) favoring dissimilar test sets. Interestingly, two out of four chatbots mirrored this preference, aligning with human intuition, while the others opted for similar test scenarios, chosen by only 3.9% of testers. Our initial insights suggest a promising avenue within the context of enhancing the collaborative dynamics between testers and chatbots.
翻译:大型语言模型(LLMs)与聊天机器人的整合为软件测试中的决策制定带来了新的挑战与机遇。决策过程依赖于多种信息,包括代码、需求规格说明以及其他软件制品,这些信息往往表述不清或仅存在于开发者的思维中。为填补信息不明确所留下的空白,我们通常需要借助假设、直觉或过往经验进行决策。本文探讨了基于LLM的聊天机器人(如Bard、Copilot和ChatGPT)在支持软件测试人员进行测试决策(例如有效确定测试用例优先级)方面的潜力。我们研究了在测试用例穷尽执行通常不可行的限制性测试场景中,基于LLM的聊天机器人与人类测试者是否具有相似的“假设”或直觉。一项针对127名测试人员的初步调查结果显示,测试者普遍倾向于多样化的测试场景,其中绝大多数(96%)更青睐差异性测试集。值得注意的是,四款聊天机器人中有两款反映了这一倾向,与人类直觉保持一致;而另外两款则选择了相似测试场景,该选项仅获得3.9%测试者的选择。我们的初步研究表明,在增强测试者与聊天机器人协作关系的背景下,该领域存在值得探索的研究前景。