Early detection and resolution of duplicate and conflicting requirements can significantly enhance project efficiency and overall software quality. Researchers have developed various computational predictors by leveraging Artificial Intelligence (AI) potential to detect duplicate and conflicting requirements. However, these predictors lack in performance and requires more effective approaches to empower software development processes. Following the need of a unique predictor that can accurately identify duplicate and conflicting requirements, this research offers a comprehensive framework that facilitate development of 3 different types of predictive pipelines: language models based, multi-model similarity knowledge-driven and large language models (LLMs) context + multi-model similarity knowledge-driven. Within first type predictive pipelines landscape, framework facilitates conflicting/duplicate requirements identification by leveraging 8 distinct types of LLMs. In second type, framework supports development of predictive pipelines that leverage multi-scale and multi-model similarity knowledge, ranging from traditional similarity computation methods to advanced similarity vectors generated by LLMs. In the third type, the framework synthesizes predictive pipelines by integrating contextual insights from LLMs with multi-model similarity knowledge. Across 6 public benchmark datasets, extensive testing of 760 distinct predictive pipelines demonstrates that hybrid predictive pipelines consistently outperforms other two types predictive pipelines in accurately identifying duplicate and conflicting requirements. This predictive pipeline outperformed existing state-of-the-art predictors performance with an overall performance margin of 13% in terms of F1-score
翻译:早期检测并解决重复与冲突需求能显著提升项目效率与整体软件质量。研究者已利用人工智能潜力开发了多种计算预测器来检测重复与冲突需求。然而,这些预测器在性能上存在不足,需要更有效的方法来赋能软件开发流程。针对需要能准确识别重复与冲突需求的独特预测器这一需求,本研究提出了一个综合性框架,该框架支持开发三种不同类型的预测管道:基于语言模型的预测管道、多模型相似性知识驱动的预测管道,以及大语言模型上下文+多模型相似性知识驱动的预测管道。在第一类预测管道中,框架通过利用8种不同类型的大语言模型来促进冲突/重复需求的识别。在第二类中,框架支持开发利用多尺度与多模型相似性知识的预测管道,其范围涵盖从传统相似度计算方法到大语言模型生成的先进相似度向量。在第三类中,框架通过整合大语言模型的上下文洞察与多模型相似性知识来合成预测管道。在6个公共基准数据集上,对760个不同预测管道的广泛测试表明,混合预测管道在准确识别重复与冲突需求方面持续优于其他两类预测管道。该预测管道在F1分数上以13%的整体性能优势超越了现有最先进预测器的性能。