Modern software systems rely heavily on Web APIs, yet creating meaningful and executable test scripts remains a largely manual, time-consuming, and error-prone task. In this paper, we present APITestGenie, a novel tool that leverages Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and prompt engineering to automatically generate API integration tests directly from business requirements and OpenAPI specifications. We evaluated APITestGenie on 10 real-world APIs, including 8 APIs comprising circa 1,000 live endpoints from an industrial partner in the automotive domain. The tool was able to generate syntactically and semantically valid test scripts for 89\% of the business requirements under test after at most three attempts. Notably, some generated tests revealed previously unknown defects in the APIs, including integration issues between endpoints. Statistical analysis identified API complexity and level of detail in business requirements as primary factors influencing success rates, with the level of detail in API documentation also affecting outcomes. Feedback from industry practitioners confirmed strong interest in adoption, substantially reducing the manual effort in writing acceptance tests, and improving the alignment between tests and business requirements.
翻译:现代软件系统高度依赖Web API,但创建具有意义且可执行的测试脚本仍是一项高度人工、耗时且易出错的任务。本文提出APITestGenie,一种新颖工具,它利用大语言模型(LLMs)、检索增强生成(RAG)和提示工程,直接从业务需求与OpenAPI规约自动生成API集成测试。我们在10个真实API上评估了APITestGenie,其中包括来自汽车领域工业合作伙伴的8个API(约含1000个在线端点)。该工具能够在最多三次尝试后,为89%的待测业务需求生成语法和语义有效的测试脚本。值得注意的是,部分生成的测试揭示了API中先前未知的缺陷,包括端点间的集成问题。统计分析表明,API复杂度和业务需求细节程度是影响成功率的主要因素,而API文档细节程度也会影响结果。来自行业从业者的反馈证实了该工具具有较强的采用意愿,能大幅减少编写验收测试的人工工作量,并提升测试与业务需求之间的一致性。