Generating Test Scenarios from NL Requirements using Retrieval-Augmented LLMs: An Industrial Study

Test scenarios are specific instances of test cases that describe actions to validate a particular software functionality. By outlining the conditions under which the software operates and the expected outcomes, test scenarios ensure that the software functionality is tested in an integrated manner. Test scenarios are crucial for systematically testing an application under various conditions, including edge cases, to identify potential issues and guarantee overall performance and reliability. Specifying test scenarios is tedious and requires a deep understanding of software functionality and the underlying domain. It further demands substantial effort and investment from already time- and budget-constrained requirements engineers and testing teams. This paper presents an automated approach (RAGTAG) for test scenario generation using Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs). RAG allows the integration of specific domain knowledge with LLMs' generation capabilities. We evaluate RAGTAG on two industrial projects from Austrian Post with bilingual requirements in German and English. Our results from an interview survey conducted with four experts on five dimensions -- relevance, coverage, correctness, coherence and feasibility, affirm the potential of RAGTAG in automating test scenario generation. Specifically, our results indicate that, despite the difficult task of analyzing bilingual requirements, RAGTAG is able to produce scenarios that are well-aligned with the underlying requirements and provide coverage of different aspects of the intended functionality. The generated scenarios are easily understandable to experts and feasible for testing in the project environment. The overall correctness is deemed satisfactory; however, gaps in capturing exact action sequences and domain nuances remain, underscoring the need for domain expertise when applying LLMs.

翻译：测试场景是描述验证特定软件功能操作步骤的测试用例具体实例。通过概述软件运行条件及预期结果，测试场景确保软件功能得到集成化测试。测试场景对于系统性地检测应用在各种条件（包括边界情况）下的运行状况、识别潜在问题并保障整体性能与可靠性至关重要。然而，制定测试场景既繁琐又要求深入理解软件功能及底层领域知识，且需要时间和预算本就紧张的工程与测试团队投入大量精力。本文提出一种自动化方法RAGTAG，通过检索增强生成（RAG）技术结合大语言模型（LLMs）生成测试场景。RAG技术使LLMs能够整合特定领域知识并发挥其生成能力。我们基于奥地利邮政两个工业项目（含德语与英语双语需求）评估了RAGTAG。通过访谈四位专家从相关性、覆盖度、正确性、连贯性和可行性五个维度展开评估，结果证实了RAGTAG在自动化测试场景生成方面的潜力。具体而言，尽管分析双语需求颇具挑战，RAGTAG仍能生成与底层需求高度对齐的场景，且覆盖目标功能的不同方面。生成的场景不仅易于专家理解，在项目环境中也具备测试可行性。整体正确性达到满意水平，但在精确动作序列捕捉与领域细节把握方面仍存在差距，这凸显了应用LLMs时需要引入领域专业知识。

相关内容

CASES

关注 4

CASES：International Conference on Compilers, Architectures, and Synthesis for Embedded Systems。 Explanation：嵌入式系统编译器、体系结构和综合国际会议。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/cases/index.html

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日