Natural Language Processing for Requirements Formalization: How to Derive New Approaches?

It is a long-standing desire of industry and research to automate the software development and testing process as much as possible. In this process, requirements engineering (RE) plays a fundamental role for all other steps that build on it. Model-based design and testing methods have been developed to handle the growing complexity and variability of software systems. However, major effort is still required to create specification models from a large set of functional requirements provided in natural language. Numerous approaches based on natural language processing (NLP) have been proposed in the literature to generate requirements models using mainly syntactic properties. Recent advances in NLP show that semantic quantities can also be identified and used to provide better assistance in the requirements formalization process. In this work, we present and discuss principal ideas and state-of-the-art methodologies from the field of NLP in order to guide the readers on how to create a set of rules and methods for the semi-automated formalization of requirements according to their specific use case and needs. We discuss two different approaches in detail and highlight the iterative development of rule sets. The requirements models are represented in a human- and machine-readable format in the form of pseudocode. The presented methods are demonstrated on two industrial use cases from the automotive and railway domains. It shows that using current pre-trained NLP models requires less effort to create a set of rules and can be easily adapted to specific use cases and domains. In addition, findings and shortcomings of this research area are highlighted and an outlook on possible future developments is given.

翻译：长期以来，工业界与研究领域一直渴望尽可能自动化软件开发和测试流程。在此过程中，需求工程（requirements engineering, RE）为所有后续步骤奠定了根本性基础。为应对软件系统日益增长的复杂性和多变性，已开发出基于模型的设计与测试方法。然而，从以自然语言形式提供的大量功能需求中创建规格模型仍需投入大量精力。文献中已提出诸多基于自然语言处理（NLP）的方法，主要利用句法特性生成需求模型。NLP的最新进展表明，语义量亦可被识别并用于为需求形式化过程提供更优辅助。本文阐述并讨论了NLP领域的主要思想与前沿方法论，旨在引导读者如何根据具体用例与需求，创建一套用于半自动化形式化需求的规则与方法。我们详细探讨了两种不同方法，并强调了规则集的迭代开发过程。需求模型以伪代码形式呈现为人类与机器可读的格式。所提出的方法在汽车与铁路领域的两个工业用例中进行了演示。结果表明，使用当前预训练的NLP模型创建规则集所需工作量更少，且易于适配至特定用例与领域。此外，本文还指出了该研究领域的发现与不足，并对未来可能的发展方向进行了展望。