A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends

General large language models (LLMs), represented by ChatGPT, have demonstrated significant potential in tasks such as code generation in software engineering. This has led to the development of specialized LLMs for software engineering, known as Code LLMs. A considerable portion of Code LLMs is derived from general LLMs through model fine-tuning. As a result, Code LLMs are often updated frequently and their performance can be influenced by the base LLMs. However, there is currently a lack of systematic investigation into Code LLMs and their performance. In this study, we conduct a comprehensive survey and analysis of the types of Code LLMs and their differences in performance compared to general LLMs. We aim to address three questions: (1) What LLMs are specifically designed for software engineering tasks, and what is the relationship between these Code LLMs? (2) Do Code LLMs really outperform general LLMs in software engineering tasks? (3) Which LLMs are more proficient in different software engineering tasks? To answer these questions, we first collect relevant literature and work from five major databases and open-source communities, resulting in 134 works for analysis. Next, we categorize the Code LLMs based on their publishers and examine their relationships with general LLMs and among themselves. Furthermore, we investigate the performance differences between general LLMs and Code LLMs in various software engineering tasks to demonstrate the impact of base models and Code LLMs. Finally, we comprehensively maintained the performance of LLMs across multiple mainstream benchmarks to identify the best-performing LLMs for each software engineering task. Our research not only assists developers of Code LLMs in choosing base models for the development of more advanced LLMs but also provides insights for practitioners to better understand key improvement directions for Code LLMs.

翻译：以ChatGPT为代表的通用大语言模型（LLMs）已在软件工程任务（如代码生成）中展现出显著潜力，这催生了面向软件工程的专用LLMs，即代码大语言模型（Code LLMs）。相当数量的Code LLMs通过模型微调从通用LLMs衍生而来。因此，Code LLMs通常更新频繁，且其性能可能受到基座LLMs的影响。然而，目前对Code LLMs及其性能缺乏系统性研究。本文对Code LLMs的类型及其与通用LLMs的性能差异进行了全面调查与分析，旨在解答三个问题：（1）哪些LLMs专为软件工程任务设计，这些Code LLMs之间存在何种关联？（2）在软件工程任务中，Code LLMs是否真正优于通用LLMs？（3）在不同软件工程任务中，哪些LLMs更具优势？为回答这些问题，我们首先从五大数据库和开源社区收集相关文献与工作，共获得134项研究进行分析。随后，根据发布机构对Code LLMs进行分类，并考察其与通用LLMs及相互之间的关系。此外，我们探究通用LLMs与Code LLMs在各类软件工程任务中的性能差异，以揭示基座模型与Code LLMs的影响。最后，我们在多个主流基准测试上全面维护LLMs的性能数据，以识别每类软件工程任务中性能最佳的LLMs。本研究不仅有助于Code LLMs开发者选择基座模型以开发更先进的LLMs，也为实践者深入理解Code LLMs的关键改进方向提供了洞见。

相关内容

Engineering

关注 7

《工程》是中国工程院（CAE）于2015年推出的国际开放存取期刊。其目的是提供一个高水平的平台，传播和分享工程研发的前沿进展、当前主要研究成果和关键成果；报告工程科学的进展，讨论工程发展的热点、兴趣领域、挑战和前景，在工程中考虑人与环境的福祉和伦理道德，鼓励具有深远经济和社会意义的工程突破和创新，使之达到国际先进水平，成为新的生产力，从而改变世界，造福人类，创造新的未来。期刊链接：https://www.sciencedirect.com/journal/engineering

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日