[Context]: Companies are increasingly recognizing the importance of automating Requirements Engineering (RE) tasks due to their resource-intensive nature. The advent of GenAI has made these tasks more amenable to automation, thanks to its ability to understand and interpret context effectively. [Problem]: However, in the context of GenAI, prompt engineering is a critical factor for success. Despite this, we currently lack tools and methods to systematically assess and determine the most effective prompt patterns to employ for a particular RE task. [Method]: Two tasks related to requirements, specifically requirement classification and tracing, were automated using the GPT-3.5 turbo API. The performance evaluation involved assessing various prompts created using 5 prompt patterns and implemented programmatically to perform the selected RE tasks, focusing on metrics such as precision, recall, accuracy, and F-Score. [Results]: This paper evaluates the effectiveness of the 5 prompt patterns' ability to make GPT-3.5 turbo perform the selected RE tasks and offers recommendations on which prompt pattern to use for a specific RE task. Additionally, it also provides an evaluation framework as a reference for researchers and practitioners who want to evaluate different prompt patterns for different RE tasks.
翻译:[背景]:企业日益认识到自动化需求工程(RE)任务的必要性,因其具有资源密集型特性。生成式AI(GenAI)的出现使这些任务更易于实现自动化,这得益于其有效理解和解读语境的能力。[问题]:然而,在GenAI背景下,提示工程是成功的关键因素。尽管如此,我们目前缺乏系统评估和确定针对特定RE任务的最有效提示模式的工具与方法。[方法]:利用GPT-3.5 turbo API自动执行两项需求相关任务,即需求分类与需求追踪。通过评估基于5种提示模式编程实现并用于执行所选RE任务的各类提示的性能,聚焦于精确率、召回率、准确率和F值等指标。[结果]:本文评估了5种提示模式使GPT-3.5 turbo执行所选RE任务的有效性,并就针对特定RE任务应使用哪种提示模式提出建议。此外,本文还提供了一个评估框架,供希望针对不同RE任务评估不同提示模式的研究人员和实践者参考。