Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks, known as Code LLMs, particularly in code generation that generates source code with LLM from natural language descriptions. This burgeoning field has captured significant interest from both academic researchers and industry professionals due to its practical significance in software development, e.g., GitHub Copilot. Despite the active exploration of LLMs for a variety of code tasks, either from the perspective of natural language processing (NLP) or software engineering (SE) or both, there is a noticeable absence of a comprehensive and up-to-date literature review dedicated to LLM for code generation. In this survey, we aim to bridge this gap by providing a systematic literature review that serves as a valuable reference for researchers investigating the cutting-edge progress in LLMs for code generation. We introduce a taxonomy to categorize and discuss the recent developments in LLMs for code generation, covering aspects such as data curation, latest advances, performance evaluation, and real-world applications. In addition, we present a historical overview of the evolution of LLMs for code generation and offer an empirical comparison using the widely recognized HumanEval and MBPP benchmarks to highlight the progressive enhancements in LLM capabilities for code generation. We identify critical challenges and promising opportunities regarding the gap between academia and practical development. Furthermore, we have established a dedicated resource website (https://codellm.github.io) to continuously document and disseminate the most recent advances in the field.
翻译:大语言模型(LLMs)在各类代码相关任务中取得了显著进展,此类模型通常被称为代码大语言模型(Code LLMs),尤其是在代码生成领域——即利用大语言模型根据自然语言描述生成源代码。这一新兴领域因其在软件开发中的实际意义(例如GitHub Copilot)而受到学术界研究人员和工业界专业人士的广泛关注。尽管从自然语言处理(NLP)、软件工程(SE)或两者结合的视角,针对大语言模型在各类代码任务中的应用已进行了大量探索,但目前仍缺乏专门针对代码生成大语言模型的全面且最新的文献综述。本综述旨在弥补这一空白,通过提供系统的文献综述,为研究代码生成大语言模型前沿进展的研究者提供有价值的参考。我们提出了一个分类体系,用以归类和讨论代码生成大语言模型的最新发展,涵盖数据整理、最新进展、性能评估和实际应用等方面。此外,我们回顾了代码生成大语言模型的演进历史,并利用广泛认可的人类评估(HumanEval)和MBPP基准测试进行了实证比较,以突显代码生成大语言模型能力的逐步提升。我们指出了当前学术界与实际开发之间存在差距的关键挑战与潜在机遇。同时,我们建立了一个专门的资源网站(https://codellm.github.io),以持续记录和传播该领域的最新进展。