Large language models (LLMs) have demonstrated remarkable capabilities across various NLP tasks and have recently expanded their impact to coding tasks, bridging the gap between natural languages (NL) and programming languages (PL). This taxonomy-based survey provides a comprehensive analysis of LLMs in the NL-PL domain, investigating how these models are utilized in coding tasks and examining their methodologies, architectures, and training processes. We propose a taxonomy-based framework that categorizes relevant concepts, providing a unified classification system to facilitate a deeper understanding of this rapidly evolving field. This survey offers insights into the current state and future directions of LLMs in coding tasks, including their applications and limitations.
翻译:大语言模型(LLMs)在各种自然语言处理任务中展现出卓越能力,并近期将其影响力扩展至代码任务,弥合了自然语言(NL)与编程语言(PL)之间的鸿沟。本基于分类法的综述对NL-PL领域的大语言模型进行了全面分析,探究这些模型如何被应用于代码任务,并审视其方法论、架构与训练流程。我们提出了一个基于分类法的框架,对相关概念进行系统归类,提供一个统一的分类体系以促进对这一快速发展领域的深入理解。本综述深入探讨了大语言模型在代码任务中的现状与未来方向,包括其应用场景与局限性。