Natural Language Requirements Testability Measurement Based on Requirement Smells

Requirements form the basis for defining software systems' obligations and tasks. Testable requirements help prevent failures, reduce maintenance costs, and make it easier to perform acceptance tests. However, despite the importance of measuring and quantifying requirements testability, no automatic approach for measuring requirements testability has been proposed based on the requirements smells, which are at odds with the requirements testability. This paper presents a mathematical model to evaluate and rank the natural language requirements testability based on an extensive set of nine requirements smells, detected automatically, and acceptance test efforts determined by requirement length and its application domain. Most of the smells stem from uncountable adjectives, context-sensitive, and ambiguous words. A comprehensive dictionary is required to detect such words. We offer a neural word-embedding technique to generate such a dictionary automatically. Using the dictionary, we could automatically detect Polysemy smell (domain-specific ambiguity) for the first time in 10 application domains. Our empirical study on nearly 1000 software requirements from six well-known industrial and academic projects demonstrates that the proposed smell detection approach outperforms Smella, a state-of-the-art tool, in detecting requirements smells. The precision and recall of smell detection are improved with an average of 0.03 and 0.33, respectively, compared to the state-of-the-art. The proposed requirement testability model measures the testability of 985 requirements with a mean absolute error of 0.12 and a mean squared error of 0.03, demonstrating the model's potential for practical use.

翻译：需求是定义软件系统义务与任务的基础。可测试的需求有助于预防故障、降低维护成本，并简化验收测试的执行。然而，尽管测量与量化需求可测试性至关重要，但目前尚未有基于需求坏味（一种与需求可测试性相悖的特征）来自动化度量需求可测试性的方法。本文提出一种数学模型，基于自动检测的九种需求坏味集合，以及由需求长度及其应用领域确定的验收测试工作量，对自然语言需求的可测试性进行评估与排序。多数坏味源于不可数形容词、上下文相关词及歧义词，需构建综合词典以检测此类词汇。我们采用神经词嵌入技术自动生成该词典，并首次在10个应用领域中利用该词典自动检测多义词坏味（领域特定歧义）。基于对来自六个知名工业与学术项目的近1000条软件需求的实证研究表明，所提坏味检测方法在需求坏味识别上优于当前最先进工具Smella。与现有技术相比，坏味检测的精确率与召回率分别平均提升0.03和0.33。所提需求可测试性模型对985条需求的可测试性进行度量，平均绝对误差为0.12，均方误差为0.03，证明了该模型的实际应用潜力。