Exploring LLMs for Verifying Technical System Specifications Against Requirements

Requirements engineering is a knowledge intensive process and crucial for the success of engineering projects. The field of knowledge-based requirements engineering (KBRE) aims to support engineers by providing knowledge to assist in the elicitation, validation, and management of system requirements. The advent of large language models (LLMs) opens new opportunities in the field of KBRE. This work experimentally investigates the potential of LLMs in requirements verification. Therein, LLMs are provided with a set of requirements and a textual system specification and are prompted to assess which requirements are fulfilled by the system specification. Different experimental variables such as system specification complexity, the number of requirements, and prompting strategies were analyzed. Formal rule-based systems serve as a benchmark to compare LLM performance to. Requirements and system specifications are derived from the smart-grid domain. Results show that advanced LLMs, like GPT-4o and Claude 3.5 Sonnet, achieved f1-scores between 79 % and 94 % in identifying non-fulfilled requirements, indicating potential for LLMs to be leveraged for requirements verification.

翻译：需求工程是一个知识密集型过程，对工程项目的成功至关重要。基于知识的需求工程领域旨在通过提供知识来支持工程师进行系统需求的获取、验证和管理。大语言模型的出现为基于知识的需求工程领域带来了新的机遇。本研究通过实验探讨了大语言模型在需求验证中的潜力。实验中，我们向大语言模型提供一组需求和一个文本形式的系统规范，并提示其评估系统规范满足了哪些需求。我们分析了系统规范复杂度、需求数量以及提示策略等不同实验变量。研究以基于规则的正式系统作为基准，与大语言模型的性能进行比较。需求和系统规范均源自智能电网领域。结果表明，先进的大语言模型（如GPT-4o和Claude 3.5 Sonnet）在识别未满足需求方面取得了79%至94%的F1分数，这表明大语言模型在需求验证方面具有应用潜力。

相关内容

Engineering

关注 6

《工程》是中国工程院（CAE）于2015年推出的国际开放存取期刊。其目的是提供一个高水平的平台，传播和分享工程研发的前沿进展、当前主要研究成果和关键成果；报告工程科学的进展，讨论工程发展的热点、兴趣领域、挑战和前景，在工程中考虑人与环境的福祉和伦理道德，鼓励具有深远经济和社会意义的工程突破和创新，使之达到国际先进水平，成为新的生产力，从而改变世界，造福人类，创造新的未来。期刊链接：https://www.sciencedirect.com/journal/engineering

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日