Large language models (LLMs) have demonstrated impressive capabilities across various natural language processing (NLP) tasks, such as machine translation, question answering, summarization, and so on. Additionally, LLMs are also highly valuable in supporting software engineering tasks, particularly in the field of code generation. Automatic code generation is a process of automatically generating source code or executable code based on given specifications or requirements, improving developer productivity. In this study, we perform a systematic empirical assessment of code generation using ChatGPT, a recent and popular LLM. Our evaluation encompasses a comprehensive analysis of code snippets generated by ChatGPT, focusing on three critical aspects: correctness, understandability, and security. We also specifically investigate ChatGPT's ability to engage in multi-round process (i.e., ChatGPT's dialog ability) of facilitating code generation. By delving into the generated code and examining the experimental results, this work provides valuable insights into the performance of ChatGPT in tackling code generation tasks. Overall, our findings uncover potential issues and limitations that arise in the ChatGPT-based code generation and lay the groundwork for improving AI and LLM-based code generation techniques.
翻译:大型语言模型在各类自然语言处理任务中展现出令人瞩目的能力,如机器翻译、问答、摘要生成等。此外,大型语言模型在支持软件工程任务方面也具有极高价值,特别是在代码生成领域。自动代码生成是指根据给定的规范或需求,自动生成源代码或可执行代码的过程,能够提升开发者的生产效率。本研究针对近期流行的大型语言模型ChatGPT,系统性地对其代码生成能力进行了实证评估。我们的评估包含对ChatGPT生成代码片段的全面分析,重点关注三个关键方面:正确性、可理解性和安全性。同时,我们专门探究了ChatGPT在多轮交互过程中(即ChatGPT的对话能力)促进代码生成的能力。通过深入分析生成的代码和实验结果,本研究为ChatGPT处理代码生成任务的表现提供了宝贵见解。总体而言,我们的研究发现揭示了ChatGPT代码生成过程中潜在的问题与局限性,为改进基于人工智能和大型语言模型的代码生成技术奠定了基础。