Improvement of software development methodologies attracts developers to automatic Requirement Formalisation (RF) in the Requirement Engineering (RE) field. The potential advantages by applying Natural Language Processing (NLP) and Machine Learning (ML) in reducing the ambiguity and incompleteness of requirement written in natural languages is reported in different studies. The goal of this paper is to survey and classify existing work on NLP and ML for RF, identifying challenges in this domain and providing promising future research directions. To achieve this, we conducted a systematic literature review to outline the current state-of-the-art of NLP and ML techniques in RF by selecting 257 papers from common used libraries. The search result is filtered by defining inclusion and exclusion criteria and 47 relevant studies between 2012 and 2022 are selected. We found that heuristic NLP approaches are the most common NLP techniques used for automatic RF, primary operating on structured and semi-structured data. This study also revealed that Deep Learning (DL) technique are not widely used, instead classical ML techniques are predominant in the surveyed studies. More importantly, we identified the difficulty of comparing the performance of different approaches due to the lack of standard benchmark cases for RF.
翻译:软件工程方法的改进吸引了需求工程领域中对自动需求形式化(RF)的关注。不同研究表明,应用自然语言处理(NLP)和机器学习(ML)可减少自然语言撰写的需求中存在的模糊性与不完整性。本文旨在调查并分类现有面向RF的NLP与ML研究工作,识别该领域的挑战,并提供有前景的未来研究方向。为此,我们通过系统文献综述方法,从常用数据集中筛选257篇论文,依据纳入与排除标准进行过滤,最终选取2012至2022年间的47篇相关研究,以勾勒当前RF中NLP与ML技术的先进水平。研究发现,启发式NLP方法是最常用于自动RF的技术,主要处理结构化和半结构化数据。同时,该研究揭示深度学习(DL)技术并未被广泛采用,而经典ML技术仍占主导地位。更为重要的是,我们发现了由于缺乏RF标准化基准案例,导致不同方法的性能难以比较。