This paper presents the system submitted by the team from IIT(ISM) Dhanbad in FIRE IRSE 2023 shared task 1 on the automatic usefulness prediction of code-comment pairs as well as the impact of Large Language Model(LLM) generated data on original base data towards an associated source code. We have developed a framework where we train a machine learning-based model using the neural contextual representations of the comments and their corresponding codes to predict the usefulness of code-comments pair and performance analysis with LLM-generated data with base data. In the official assessment, our system achieves a 4% increase in F1-score from baseline and the quality of generated data.
翻译:本文介绍了IIT(ISM) Dhanbad团队在FIRE IRSE 2023共享任务1中提交的系统,该任务聚焦于代码-注释对的有用性自动预测,以及大型语言模型(LLM)生成数据对原始基础数据(关联源代码)的影响。我们开发了一个框架,通过利用注释及其对应代码的神经上下文表示来训练基于机器学习的模型,从而预测代码-注释对的有用性,并分析LLM生成数据与基础数据结合后的性能表现。在官方评估中,我们的系统在F1分数上相比基线提升了4%,并验证了生成数据的质量。