Test scenarios are specific instances of test cases that describe actions to validate a particular software functionality. By outlining the conditions under which the software operates and the expected outcomes, test scenarios ensure that the software functionality is tested in an integrated manner. Test scenarios are crucial for systematically testing an application under various conditions, including edge cases, to identify potential issues and guarantee overall performance and reliability. Specifying test scenarios is tedious and requires a deep understanding of software functionality and the underlying domain. It further demands substantial effort and investment from already time- and budget-constrained requirements engineers and testing teams. This paper presents an automated approach (RAGTAG) for test scenario generation using Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs). RAG allows the integration of specific domain knowledge with LLMs' generation capabilities. We evaluate RAGTAG on two industrial projects from Austrian Post with bilingual requirements in German and English. Our results from an interview survey conducted with four experts on five dimensions -- relevance, coverage, correctness, coherence and feasibility, affirm the potential of RAGTAG in automating test scenario generation. Specifically, our results indicate that, despite the difficult task of analyzing bilingual requirements, RAGTAG is able to produce scenarios that are well-aligned with the underlying requirements and provide coverage of different aspects of the intended functionality. The generated scenarios are easily understandable to experts and feasible for testing in the project environment. The overall correctness is deemed satisfactory; however, gaps in capturing exact action sequences and domain nuances remain, underscoring the need for domain expertise when applying LLMs.
翻译:测试场景是描述验证特定软件功能操作步骤的测试用例具体实例。通过概述软件运行条件及预期结果,测试场景确保软件功能得到集成化测试。测试场景对于系统性地检测应用在各种条件(包括边界情况)下的运行状况、识别潜在问题并保障整体性能与可靠性至关重要。然而,制定测试场景既繁琐又要求深入理解软件功能及底层领域知识,且需要时间和预算本就紧张的工程与测试团队投入大量精力。本文提出一种自动化方法RAGTAG,通过检索增强生成(RAG)技术结合大语言模型(LLMs)生成测试场景。RAG技术使LLMs能够整合特定领域知识并发挥其生成能力。我们基于奥地利邮政两个工业项目(含德语与英语双语需求)评估了RAGTAG。通过访谈四位专家从相关性、覆盖度、正确性、连贯性和可行性五个维度展开评估,结果证实了RAGTAG在自动化测试场景生成方面的潜力。具体而言,尽管分析双语需求颇具挑战,RAGTAG仍能生成与底层需求高度对齐的场景,且覆盖目标功能的不同方面。生成的场景不仅易于专家理解,在项目环境中也具备测试可行性。整体正确性达到满意水平,但在精确动作序列捕捉与领域细节把握方面仍存在差距,这凸显了应用LLMs时需要引入领域专业知识。