Large Language Models (LLMs) for code have gained significant attention recently. They can generate code in different programming languages based on provided prompts, fulfilling a long-lasting dream in Software Engineering (SE), i.e., automatic code generation. Similar to human-written code, LLM-generated code is prone to bugs, and these bugs have not yet been thoroughly examined by the community. Given the increasing adoption of LLM-based code generation tools (e.g., GitHub Copilot) in SE activities, it is critical to understand the characteristics of bugs contained in code generated by LLMs. This paper examines a sample of 333 bugs collected from code generated using three leading LLMs (i.e., CodeGen, PanGu-Coder, and Codex) and identifies the following 10 distinctive bug patterns: Misinterpretations, Syntax Error, Silly Mistake, Prompt-biased code, Missing Corner Case, Wrong Input Type, Hallucinated Object, Wrong Attribute, Incomplete Generation, and Non-Prompted Consideration. The bug patterns are presented in the form of a taxonomy. The identified bug patterns are validated using an online survey with 34 LLM practitioners and researchers. The surveyed participants generally asserted the significance and prevalence of the bug patterns. Researchers and practitioners can leverage these findings to develop effective quality assurance techniques for LLM-generated code. This study sheds light on the distinctive characteristics of LLM-generated code.
翻译:大型语言模型(LLM)在代码生成领域近期引起了广泛关注。这些模型能够根据提供的提示生成不同编程语言的代码,实现了软件工程中长久以来的梦想——自动代码生成。与人工编写的代码类似,LLM生成的代码也容易存在缺陷,且这些缺陷尚未被社区深入研究。随着基于LLM的代码生成工具(如GitHub Copilot)在软件工程活动中的日益普及,理解LLM生成代码中缺陷的特征至关重要。本文从三个主流LLM(CodeGen、PanGu-Coder和Codex)生成的代码中抽样收集了333个缺陷,并识别出以下10种独特的缺陷模式:误解、语法错误、低级错误、提示偏向代码、缺失边界情况、错误输入类型、幻觉对象、错误属性、不完整生成以及非提示考虑。这些缺陷模式以分类体系的形式呈现。我们通过一项对34位LLM从业者和研究人员的在线调查验证了所识别的缺陷模式。受访者普遍肯定了这些缺陷模式的重要性和普遍性。研究人员和从业者可利用这些发现开发针对LLM生成代码的有效质量保障技术。本研究揭示了LLM生成代码的独特特征。