Context: In collaborative software development, the peer code review process proves beneficial only when the reviewers provide useful comments. Objective: This paper investigates the usefulness of Code Review Comments (CR comments) through textual feature-based and featureless approaches. Method: We select three available datasets from both open-source and commercial projects. Additionally, we introduce new features from software and non-software domains. Moreover, we experiment with the presence of jargon, voice, and codes in CR comments and classify the usefulness of CR comments through featurization, bag-of-words, and transfer learning techniques. Results: Our models outperform the baseline by achieving state-of-the-art performance. Furthermore, the result demonstrates that the commercial gigantic LLM, GPT-4o, or non-commercial naive featureless approach, Bag-of-Word with TF-IDF, is more effective for predicting the usefulness of CR comments. Conclusion: The significant improvement in predicting usefulness solely from CR comments escalates research on this task. Our analyses portray the similarities and differences of domains, projects, datasets, models, and features for predicting the usefulness of CR comments.
翻译:背景:在协作式软件开发中,只有当评审者提供有用的评论时,同行代码审查过程才能发挥其价值。目标:本文通过基于文本特征的方法和无特征方法,研究代码审查评论(CR评论)的有用性。方法:我们从开源项目和商业项目中选取了三个可用数据集,并引入了来自软件领域和非软件领域的新特征。此外,我们通过实验探究了CR评论中术语、语气和代码片段的存在情况,并利用特征化、词袋模型和迁移学习技术对CR评论的有用性进行分类。结果:我们的模型在性能上超越了基线方法,达到了当前最优水平。进一步的结果表明,商业大型语言模型GPT-4o或非商业的朴素无特征方法——结合TF-IDF的词袋模型,在预测CR评论有用性方面更为有效。结论:仅通过CR评论即可显著提升有用性预测能力,这推动了该任务的研究进展。我们的分析揭示了不同领域、项目、数据集、模型和特征在预测CR评论有用性方面的相似性与差异性。