Automatic Code Summarization via ChatGPT: How Far Are We?

To support software developers in understanding and maintaining programs, various automatic code summarization techniques have been proposed to generate a concise natural language comment for a given code snippet. Recently, the emergence of large language models (LLMs) has led to a great boost in the performance of natural language processing tasks. Among them, ChatGPT is the most popular one which has attracted wide attention from the software engineering community. However, it still remains unclear how ChatGPT performs in (automatic) code summarization. Therefore, in this paper, we focus on evaluating ChatGPT on a widely-used Python dataset called CSN-Python and comparing it with several state-of-the-art (SOTA) code summarization models. Specifically, we first explore an appropriate prompt to guide ChatGPT to generate in-distribution comments. Then, we use such a prompt to ask ChatGPT to generate comments for all code snippets in the CSN-Python test set. We adopt three widely-used metrics (including BLEU, METEOR, and ROUGE-L) to measure the quality of the comments generated by ChatGPT and SOTA models (including NCS, CodeBERT, and CodeT5). The experimental results show that in terms of BLEU and ROUGE-L, ChatGPT's code summarization performance is significantly worse than all three SOTA models. We also present some cases and discuss the advantages and disadvantages of ChatGPT in code summarization. Based on the findings, we outline several open challenges and opportunities in ChatGPT-based code summarization.

翻译：为了支持软件开发人员理解和维护程序，研究者提出了多种自动代码摘要技术，旨在为给定代码片段生成简洁的自然语言注释。近年来，大型语言模型（LLM）的出现显著提升了自然语言处理任务的性能，其中ChatGPT作为最受欢迎的模型，已引起软件工程社区的广泛关注。然而，ChatGPT在（自动）代码摘要中的表现仍不明确。为此，本文聚焦于评估ChatGPT在广泛使用的Python数据集CSN-Python上的表现，并将其与多个最先进的代码摘要模型进行对比。具体而言，我们首先探索合适的提示词以引导ChatGPT生成符合分布的注释，随后使用该提示词要求ChatGPT为CSN-Python测试集中的所有代码片段生成注释。我们采用三个广泛使用的指标（包括BLEU、METEOR和ROUGE-L）衡量ChatGPT与最先进模型（包括NCS、CodeBERT和CodeT5）生成注释的质量。实验结果表明：在BLEU和ROUGE-L指标上，ChatGPT的代码摘要性能显著劣于所有三个最先进模型。我们还展示了一些案例，讨论了ChatGPT在代码摘要中的优势与不足。基于研究结果，我们概述了基于ChatGPT的代码摘要所面临的若干开放挑战与机遇。

相关内容

ChatGPT

关注 258

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/