Requirements Satisfaction Assessment (RSA) evaluates whether the set of design elements linked to a single requirement provide sufficient coverage of that requirement -- typically meaning that all concepts in the requirement are addressed by at least one of the design elements. RSA is an important software engineering activity for systems with any form of hierarchical decomposition -- especially safety or mission critical ones. In previous studies, researchers used basic Information Retrieval (IR) models to decompose requirements and design elements into chunks, and then evaluated the extent to which chunks of design elements covered all chunks in the requirement. However, results had low accuracy because many critical concepts that extend across the entirety of the sentence were not well represented when the sentence was parsed into independent chunks. In this paper we leverage recent advances in natural language processing to deliver significantly more accurate results. We propose two major architectures: Satisfaction BERT (Sat-BERT), and Dual-Satisfaction BERT (DSat-BERT), along with their multitask learning variants to improve satisfaction assessments. We perform RSA on five different datasets and compare results from our variants against the chunk-based legacy approach. All BERT-based models significantly outperformed the legacy baseline, and Sat-BERT delivered the best results returning an average improvement of 124.75% in Mean Average Precision.
翻译:需求满意度评估(RSA)用于判断与单个需求相关联的设计元素集是否充分覆盖该需求——通常意味着需求中的所有概念至少被其中一个设计元素所涵盖。对于任何具有层次分解结构的系统(尤其是安全关键或任务关键系统),RSA都是一项重要的软件工程活动。在以往研究中,研究者使用基础信息检索(IR)模型将需求和设计元素分解为片段,然后评估设计元素片段对需求中所有片段的覆盖程度。然而,由于句子解析为独立片段时,跨句子整体存在的许多关键概念未能得到良好表征,导致结果准确率较低。本文利用自然语言处理领域的最新进展,显著提升了结果准确性。我们提出了两种主要架构:满意度BERT(Sat-BERT)和双重满意度BERT(DSat-BERT),及其多任务学习变体以改进满意度评估。我们在五个不同数据集上执行RSA,并将各变体的结果与基于片段的传统方法进行比较。所有基于BERT的模型均显著优于传统基线,其中Sat-BERT表现最佳,平均精确率均值(Mean Average Precision)平均提升124.75%。