Using Language Models for Enhancing the Completeness of Natural-language Requirements

from arxiv, This paper has been accepted at the 29th International Working Conference on Requirement Engineering: Foundation for Software Quality (REFSQ 2023)

[Context and motivation] Incompleteness in natural-language requirements is a challenging problem. [Question/problem] A common technique for detecting incompleteness in requirements is checking the requirements against external sources. With the emergence of language models such as BERT, an interesting question is whether language models are useful external sources for finding potential incompleteness in requirements. [Principal ideas/results] We mask words in requirements and have BERT's masked language model (MLM) generate contextualized predictions for filling the masked slots. We simulate incompleteness by withholding content from requirements and measure BERT's ability to predict terminology that is present in the withheld content but absent in the content disclosed to BERT. [Contribution] BERT can be configured to generate multiple predictions per mask. Our first contribution is to determine how many predictions per mask is an optimal trade-off between effectively discovering omissions in requirements and the level of noise in the predictions. Our second contribution is devising a machine learning-based filter that post-processes predictions made by BERT to further reduce noise. We empirically evaluate our solution over 40 requirements specifications drawn from the PURE dataset [1]. Our results indicate that: (1) predictions made by BERT are highly effective at pinpointing terminology that is missing from requirements, and (2) our filter can substantially reduce noise from the predictions, thus making BERT a more compelling aid for improving completeness in requirements.

翻译：[背景与动机] 自然语言需求中的不完整性是一个具有挑战性的问题。[问题/研究点] 检测需求不完整性的常用方法是将需求与外部来源进行比对。随着BERT等语言模型的出现，一个值得探究的问题是：语言模型能否作为有效的外部来源，以发现需求中潜在的不完整性？[核心思想/结果] 我们通过掩码需求中的词，并利用BERT的掩码语言模型（MLM）为填充掩码槽位生成上下文相关的预测。我们通过从需求中隐藏部分内容来模拟不完整性，并衡量BERT预测术语的能力——这些术语存在于被隐藏内容中，但未出现在提供给BERT的公开内容里。[贡献] BERT可配置为每个掩码生成多个预测。本文的第一个贡献是确定每个掩码的最佳预测数量，以在有效发现需求遗漏与降低预测噪声之间取得平衡。第二个贡献是设计了一种基于机器学习的过滤器，用于对BERT生成的预测进行后处理，以进一步降低噪声。我们基于PURE数据集[1]中的40条需求规范进行了实证评估。结果表明：（1）BERT生成的预测在准确定位需求中缺失的术语方面非常有效；（2）我们的过滤器可显著降低预测中的噪声，从而使BERT成为提升需求完整性的更具吸引力的辅助工具。