Natural language (NL) is arguably the most prevalent medium for expressing systems and software requirements. Detecting incompleteness in NL requirements is a major challenge. One approach to identify incompleteness is to compare requirements with external sources. Given the rise of large language models (LLMs), an interesting question arises: Are LLMs useful external sources of knowledge for detecting potential incompleteness in NL requirements? This article explores this question by utilizing BERT. Specifically, we employ BERT's masked language model (MLM) to generate contextualized predictions for filling masked slots in requirements. To simulate incompleteness, we withhold content from the requirements and assess BERT's ability to predict terminology that is present in the withheld content but absent in the disclosed content. BERT can produce multiple predictions per mask. Our first contribution is determining the optimal number of predictions per mask, striking a balance between effectively identifying omissions in requirements and mitigating noise present in the predictions. Our second contribution involves designing a machine learning-based filter to post-process BERT's predictions and further reduce noise. We conduct an empirical evaluation using 40 requirements specifications from the PURE dataset. Our findings indicate that: (1) BERT's predictions effectively highlight terminology that is missing from requirements, (2) BERT outperforms simpler baselines in identifying relevant yet missing terminology, and (3) our filter significantly reduces noise in the predictions, enhancing BERT's effectiveness as a tool for completeness checking of requirements.
翻译:自然语言(NL)可以说是表达系统和软件需求最普遍的媒介。检测自然语言需求中的不完整性是一项重大挑战。识别不完整性的方法之一是将需求与外部来源进行比较。随着大语言模型(LLM)的兴起,一个有趣的问题随之产生:LLM能否成为检测自然语言需求潜在不完整性的有效外部知识源?本文通过利用BERT探索这一问题。具体而言,我们采用BERT的掩码语言模型(MLM)生成上下文预测,以填充需求中的掩码槽位。为模拟不完整性,我们隐藏需求中的部分内容,并评估BERT预测被隐藏内容中存在的术语(这些术语在公开内容中缺失)的能力。BERT可为每个掩码生成多个预测。我们的第一个贡献是确定每个掩码的最优预测数量,在有效识别需求遗漏与减少预测噪声之间取得平衡。第二个贡献涉及设计一种基于机器学习的过滤器,对BERT的预测进行后处理以进一步降低噪声。我们使用PURE数据集中的40份需求规约进行实证评估。结果表明:(1)BERT的预测能有效突出需求中缺失的术语;(2)BERT在识别相关但缺失的术语方面优于更简单的基线方法;(3)我们的过滤器显著降低了预测噪声,提升了BERT作为需求完整性检查工具的有效性。