The increasing use of Large Language Models (LLMs) in software development has garnered significant attention from researchers assessing the quality of the code they generate. However, much of the research focuses on controlled datasets such as HumanEval, which fail to adequately represent how developers actually utilize LLMs' code generation capabilities or clarify the characteristics of LLM-generated code in real-world development scenarios. To bridge this gap, our study investigates the characteristics of LLM-generated code and its corresponding projects hosted on GitHub. Our findings reveal several key insights: (1) ChatGPT and Copilot are the most frequently utilized for generating code on GitHub. In contrast, there is very little code generated by other LLMs on GitHub. (2) Projects containing ChatGPT/Copilot-generated code are often small and less known, led by individuals or small teams. Despite this, most projects are continuously evolving and improving. (3) ChatGPT/Copilot is mainly utilized for generating Python, Java, and TypeScript scripts for data processing and transformation. C/C++ and JavaScript code generation focuses on algorithm and data structure implementation and user interface code. Most ChatGPT/Copilot-generated code snippets are relatively short and exhibit low complexity. (4) Compared to human-written code, ChatGPT/Copilot-generated code exists in a small proportion of projects and generally undergoes fewer modifications. Additionally, modifications due to bugs are even fewer, ranging from just 3% to 8% across different languages. (5) Most comments on ChatGPT/Copilot-generated code lack detailed information, often only stating the code's origin without mentioning prompts, human modifications, or testing status. Based on these findings, we discuss the implications for researchers and practitioners.
翻译:大型语言模型在软件开发中的日益广泛应用,已引起研究者对其生成代码质量评估的显著关注。然而,现有研究多集中于HumanEval等受控数据集,这些数据集未能充分反映开发者实际如何运用LLM的代码生成能力,亦未能阐明现实开发场景中LLM生成代码的特征。为弥补这一空白,本研究深入探究了GitHub平台上托管的LLM生成代码及其对应项目的特征。我们的研究发现揭示了若干关键洞见:(1)ChatGPT与Copilot是GitHub上最常被用于代码生成的工具,而其他LLM生成的代码在GitHub上极为罕见。(2)包含ChatGPT/Copilot生成代码的项目通常规模较小且知名度较低,多由个人或小型团队主导。尽管如此,大多数项目仍在持续演进与改进。(3)ChatGPT/Copilot主要被用于生成数据处理与转换的Python、Java和TypeScript脚本;C/C++与JavaScript的代码生成则侧重于算法与数据结构实现以及用户界面代码。多数ChatGPT/Copilot生成的代码片段篇幅较短且复杂度较低。(4)相较于人工编写代码,ChatGPT/Copilot生成的代码仅存在于少数项目中,且通常经历更少的修改。此外,因缺陷导致的修改更为稀少,在不同编程语言中仅占3%至8%。(5)针对ChatGPT/Copilot生成代码的注释大多缺乏详细信息,通常仅注明代码来源而未提及提示词、人工修改或测试状态。基于这些发现,我们进一步探讨了其对研究者与实践者的启示。