Large Language Models (LLMs) for code have gained significant attention recently. They can generate code in different programming languages based on provided prompts, fulfilling a long-lasting dream in Software Engineering (SE), i.e., automatic code generation. Similar to human-written code, LLM-generated code is prone to bugs, and these bugs have not yet been thoroughly examined by the community. Given the increasing adoption of LLM-based code generation tools (e.g., GitHub Copilot) in SE activities, it is critical to understand the characteristics of bugs contained in code generated by LLMs. This paper examines a sample of 333 bugs collected from code generated using three leading LLMs (i.e., CodeGen, PanGu-Coder, and Codex) and identifies the following 10 distinctive bug patterns: Misinterpretations, Syntax Error, Silly Mistake, Prompt-biased code, Missing Corner Case, Wrong Input Type, Hallucinated Object, Wrong Attribute, Incomplete Generation, and Non-Prompted Consideration. The bug patterns are presented in the form of a taxonomy. The identified bug patterns are validated using an online survey with 34 LLM practitioners and researchers. The surveyed participants generally asserted the significance and prevalence of the bug patterns. Researchers and practitioners can leverage these findings to develop effective quality assurance techniques for LLM-generated code. This study sheds light on the distinctive characteristics of LLM-generated code.
翻译:大语言模型(LLM)在代码生成领域近期备受关注。它们能够根据提供的提示生成不同编程语言的代码,实现了软件工程(SE)领域长期以来的梦想——自动代码生成。与人类编写的代码相似,LLM生成的代码也容易存在缺陷,而学术界尚未对其特性进行深入探究。随着基于LLM的代码生成工具(例如GitHub Copilot)在软件工程活动中的日益普及,理解LLM生成代码中缺陷的特征变得至关重要。本文从三种主流LLM(即CodeGen、PanGu-Coder和Codex)生成的代码中收集了333个缺陷样本,并识别出以下10种独特的缺陷模式:误解、语法错误、低级失误、提示偏差代码、遗漏边界情况、错误输入类型、幻觉对象、错误属性、不完整生成、非提示性考量。这些缺陷模式以分类体系的形式呈现。我们通过一项包含34名LLM从业者和研究人员的在线调查验证了所识别的缺陷模式,受访者普遍确认了这些缺陷模式的重要性和普遍性。研究人员和从业者可借助这些发现开发针对LLM生成代码的有效质量保障技术。本研究揭示了LLM生成代码的独特特性。