Zero-Shot Learning for Requirements Classification: An Exploratory Study

Context and motivation: Requirements Engineering (RE) researchers have been experimenting Machine Learning (ML) and Deep Learning (DL) approaches for a range of RE tasks, such as requirements classification, requirements tracing, ambiguity detection, and modelling. Question-problem: Most of today's ML-DL approaches are based on supervised learning techniques, meaning that they need to be trained using annotated datasets to learn how to assign a class label to sample items from an application domain. This constraint poses an enormous challenge to RE researchers, as the lack of annotated datasets makes it difficult for them to fully exploit the benefit of advanced ML-DL technologies. Principal ideas-results: To address this challenge, this paper proposes an approach that employs the embedding-based unsupervised Zero-Shot Learning (ZSL) technique to perform requirements classification. We focus on the classification task because many RE tasks can be framed as classification problems. In this study, we demonstrate our approach for three tasks. (1) FR-NFR: classification functional requirements vs non-functional requirements; (2) NFR: identification of NFR classes; (3) Security: classification of security vs non-security requirements. The study shows that the ZSL approach achieves an F1 score of 0.66 for the FR-NFR task. For the NFR task, the approach yields F1 ~ 0.72-0.80, considering the most frequent classes. For the Security task, F1 ~ 0.66. All of the aforementioned F1 scores are achieved with zero-training efforts. Contribution: This study demonstrates the potential of ZSL for requirements classification. An important implication is that it is possible to have very little or no training data to perform multiple tasks. The proposed approach thus contributes to the solution of the longstanding problem of data shortage in RE.

翻译：背景与动机：需求工程（RE）研究者一直在尝试将机器学习（ML）和深度学习（DL）方法应用于一系列RE任务，例如需求分类、需求追踪、歧义检测和建模。问题与挑战：目前大多数ML-DL方法基于监督学习技术，这意味着它们需要使用标注数据集进行训练，以学习如何将类标签分配给来自应用领域的样本项目。这一限制给RE研究者带来了巨大挑战，因为缺乏标注数据集使得他们难以充分利用先进ML-DL技术的优势。核心思路与成果：为应对这一挑战，本文提出了一种方法，利用基于嵌入的无监督零样本学习（ZSL）技术执行需求分类。我们聚焦于分类任务，因为许多RE任务都可以归类为分类问题。在本研究中，我们针对三个任务展示了该方法：（1）FR-NFR：功能性需求与非功能性需求分类；（2）NFR：NFR类别识别；（3）安全性：安全性与非安全性需求分类。研究表明，ZSL方法在FR-NFR任务上达到了0.66的F1分数；在NFR任务上，针对最频繁的类别，该方法取得了约0.72–0.80的F1分数；在安全性任务上，F1分数约为0.66。上述所有F1分数均是在零训练投入下实现的。贡献：本研究展示了ZSL在需求分类中的潜力。一个重要启示是，在几乎没有或完全没有训练数据的情况下，可能完成多项任务。因此，所提出的方法有助于解决RE中长期存在的数据短缺问题。