Commit messages are explanations of changes made to a codebase that are stored in version control systems. They help developers understand the codebase as it evolves. However, writing commit messages can be tedious and inconsistent among developers. To address this issue, researchers have tried using different methods to automatically generate commit messages, including rule-based, retrieval-based, and learning-based approaches. Advances in large language models offer new possibilities for generating commit messages. In this study, we evaluate the performance of OpenAI's ChatGPT for generating commit messages based on code changes. We compare the results obtained with ChatGPT to previous automatic commit message generation methods that have been trained specifically on commit data. Our goal is to assess the extent to which large pre-trained language models can generate commit messages that are both quantitatively and qualitatively acceptable. We found that ChatGPT was able to outperform previous Automatic Commit Message Generation (ACMG) methods by orders of magnitude, and that, generally, the messages it generates are both accurate and of high-quality. We also provide insights, and a categorization, for the cases where it fails.
翻译:提交信息是对代码库变更的说明,存储在版本控制系统中,有助于开发者理解代码库的演变过程。然而,编写提交信息既繁琐又缺乏一致性。为解决此问题,研究者尝试使用多种方法自动生成提交信息,包括基于规则、基于检索和基于学习的方法。大型语言模型的进步为生成提交信息提供了新的可能性。本研究评估了OpenAI的ChatGPT基于代码变更生成提交信息的性能,并将其结果与先前专门针对提交数据训练的自动提交信息生成方法进行对比。我们的目标是评估大型预训练语言模型在定量和定性方面生成可接受的提交信息的程度。研究发现,ChatGPT的性能远超以往的自动提交信息生成(ACMG)方法,且其生成的信息通常准确且质量较高。同时,我们对其失败的场景提供了见解和分类。