A Survey on Large Language Models for Code Generation

Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks, known as Code LLMs, particularly in code generation that generates source code with LLM from natural language descriptions. This burgeoning field has captured significant interest from both academic researchers and industry professionals due to its practical significance in software development, e.g., GitHub Copilot. Despite the active exploration of LLMs for a variety of code tasks, either from the perspective of natural language processing (NLP) or software engineering (SE) or both, there is a noticeable absence of a comprehensive and up-to-date literature review dedicated to LLM for code generation. In this survey, we aim to bridge this gap by providing a systematic literature review that serves as a valuable reference for researchers investigating the cutting-edge progress in LLMs for code generation. We introduce a taxonomy to categorize and discuss the recent developments in LLMs for code generation, covering aspects such as data curation, latest advances, performance evaluation, ethical implications, environmental impact, and real-world applications. In addition, we present a historical overview of the evolution of LLMs for code generation and offer an empirical comparison using the HumanEval, MBPP, and BigCodeBench benchmarks across various levels of difficulty and types of programming tasks to highlight the progressive enhancements in LLM capabilities for code generation. We identify critical challenges and promising opportunities regarding the gap between academia and practical development. Furthermore, we have established a dedicated resource GitHub page (https://github.com/juyongjiang/CodeLLMSurvey) to continuously document and disseminate the most recent advances in the field.

翻译：大语言模型（LLMs）在各类代码相关任务中取得了显著进展，此类模型被称为代码大语言模型，尤其在代码生成方面——即基于自然语言描述利用大语言模型生成源代码。这一新兴领域因其在软件开发中的实际意义（例如GitHub Copilot）而受到学术界和工业界的广泛关注。尽管从自然语言处理（NLP）、软件工程（SE）或两者结合的视角，针对大语言模型在各类代码任务中的探索已十分活跃，但当前仍缺乏专门针对代码生成大语言模型的全面且最新的文献综述。本综述旨在填补这一空白，通过提供系统的文献综述，为研究代码生成大语言模型前沿进展的学者提供有价值的参考。我们提出了一个分类体系，以归纳和讨论代码生成大语言模型的最新发展，涵盖数据构建、最新进展、性能评估、伦理影响、环境影响及实际应用等方面。此外，我们回顾了代码生成大语言模型的演进历程，并利用HumanEval、MBPP和BigCodeBench基准测试，在不同难度级别和编程任务类型上进行了实证比较，以揭示代码生成大语言模型能力的逐步提升。我们指出了当前学术界与实际开发之间存在差距的关键挑战与潜在机遇。同时，我们已建立了一个专门的GitHub资源页面（https://github.com/juyongjiang/CodeLLMSurvey），将持续记录并传播该领域的最新进展。