Generative artificial intelligence (GAI), specifically large language models (LLMs), are increasingly used in software engineering, mainly for coding tasks. However, requirements engineering - particularly requirements validation - has seen limited application of GAI. The current focus of using GAI for requirements is on eliciting, transforming, and classifying requirements, not on quality assessment. We propose and evaluate the LLM-based (GPT-4o) approach "DeepQuali", for assessing and improving requirements quality in agile software development. We applied it to projects in two small companies, where we compared LLM-based quality assessments with expert judgments. Experts also participated in walkthroughs of the solution, provided feedback, and rated their acceptance of the approach. Experts largely agreed with the LLM's quality assessments, especially regarding overall ratings and explanations. However, they did not always agree with the other experts on detailed ratings, suggesting that expertise and experience may influence judgments. Experts recognized the usefulness of the approach but criticized the lack of integration into their workflow. LLMs show potential in supporting software engineers with the quality assessment and improvement of requirements. The explicit use of quality models and explanatory feedback increases acceptance.
翻译:生成式人工智能(GAI),特别是大型语言模型(LLM),正日益应用于软件工程领域,主要用于编码任务。然而,在需求工程——尤其是需求验证方面,GAI的应用仍然有限。当前使用GAI处理需求的重点在于需求获取、转换和分类,而非质量评估。我们提出并评估了一种基于LLM(GPT-4o)的方法“DeepQuali”,用于在敏捷软件开发中评估和改进需求质量。我们在两家小型公司的项目中应用了该方法,将基于LLM的质量评估与专家判断进行了比较。专家们还参与了解决方案的走查,提供了反馈,并对该方法的接受度进行了评分。专家在很大程度上认同LLM的质量评估,尤其是在整体评分和解释方面。然而,他们在详细评分上并不总是与其他专家意见一致,这表明专业知识和经验可能会影响判断。专家们认可该方法的有用性,但批评其未能融入他们的工作流程。LLM在支持软件工程师进行需求质量评估和改进方面展现出潜力。明确使用质量模型和解释性反馈可以提高接受度。