In mutation testing, the quality of a test suite is evaluated by introducing faults into a program and determining whether the program's tests detect them. Most existing approaches for mutation testing involve the application of a fixed set of mutation operators, e.g., replacing a "+" with a "-" or removing a function's body. However, certain types of real-world bugs cannot easily be simulated by such approaches, limiting their effectiveness. This paper presents a technique where a Large Language Model (LLM) is prompted to suggest mutations by asking it what placeholders that have been inserted in source code could be replaced with. The technique is implemented in LLMorpheus, a mutation testing tool for JavaScript, and evaluated on 13 subject packages, considering several variations on the prompting strategy, and using several LLMs. We find LLMorpheus to be capable of producing mutants that resemble existing bugs that cannot be produced by StrykerJS, a state-of-the-art mutation testing tool. Moreover, we report on the running time, cost, and number of mutants produced by LLMorpheus, demonstrating its practicality.
翻译:在变异测试中,通过向程序中注入错误并检测测试用例能否发现这些错误来评估测试集质量。现有大多数变异测试方法采用固定变异算子集合,例如将"+"替换为"-"或移除函数体。然而,这类方法难以模拟某些类型的真实程序缺陷,限制了其有效性。本文提出一种新型技术:通过提示大型语言模型(LLM)询问源代码中插入的占位符可用何种内容替换,从而生成变异体。该技术已在JavaScript变异测试工具LLMorpheus中实现,并在13个受测软件包上展开评估,考虑了多种提示策略变体并采用多种LLM。实验发现,LLMorpheus能够生成与现有最先进变异测试工具StrykerJS无法产生的真实程序缺陷相似的变异体。此外,本文报告了LLMorpheus的运行时间、成本和产生的变异体数量,验证了其实用性。