Software Metadata Classification based on Generative Artificial Intelligence

This paper presents a novel approach to enhance the performance of binary code comment quality classification models through the application of Generative Artificial Intelligence (AI). By leveraging the OpenAI API, a dataset comprising 1239 newly generated code-comment pairs, extracted from various GitHub repositories and open-source projects, has been labelled as "Useful" or "Not Useful", and integrated into the existing corpus of 9048 pairs in the C programming language. Employing a cutting-edge Large Language Model Architecture, the generated dataset demonstrates notable improvements in model accuracy. Specifically, when incorporated into the Support Vector Machine (SVM) model, a 6% increase in precision is observed, rising from 0.79 to 0.85. Additionally, the Artificial Neural Network (ANN) model exhibits a 1.5% increase in recall, climbing from 0.731 to 0.746. This paper sheds light on the potential of Generative AI in augmenting code comment quality classification models. The results affirm the effectiveness of this methodology, indicating its applicability in broader contexts within software development and quality assurance domains. The findings underscore the significance of integrating generative techniques to advance the accuracy and efficacy of machine learning models in practical software engineering scenarios.

翻译：本文提出了一种新颖方法，通过应用生成式人工智能增强二进制代码注释质量分类模型的性能。利用OpenAI API，从GitHub仓库及开源项目中提取并标注了1239对新增生成的代码-注释对，将其标记为"有用"或"无用"，并整合至C语言现有9048对语料库中。采用前沿的大语言模型架构，生成的语料集显著提升了模型准确率。具体而言，将该语料集纳入支持向量机模型后，精确率从0.79提升至0.85（增幅6%）；而人工神经网络模型的召回率从0.731升至0.746（增幅1.5%）。本文揭示了生成式人工智能在增强代码注释质量分类模型方面的潜力，验证了该方法的有效性，并表明其在软件开发与质量保障领域具有广泛适用性。研究结果强调了集成生成技术对提升实际软件工程场景中机器学习模型准确性与效能的重要意义。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/