The integration of Large Language Models (LLMs) into software engineering has driven a transition from traditional rule-based systems to autonomous agentic systems capable of solving complex problems. However, systematic progress is hindered by a lack of comprehensive understanding of how benchmarks and solutions interconnect. This survey addresses this gap by providing the first holistic analysis of LLM-powered software engineering, offering insights into evaluation methodologies and solution paradigms. We review over 150 recent papers and propose a taxonomy along two key dimensions: (1) Solutions, categorized into prompt-based, fine-tuning-based, and agent-based paradigms, and (2) Benchmarks, including tasks such as code generation, translation, and repair. Our analysis highlights the evolution from simple prompt engineering to sophisticated agentic systems incorporating capabilities like planning, reasoning, memory mechanisms, and tool augmentation. To contextualize this progress, we present a unified pipeline illustrating the workflow from task specification to deliverables, detailing how different solution paradigms address various complexity levels. Unlike prior surveys that focus narrowly on specific aspects, this work connects 50+ benchmarks to their corresponding solution strategies, enabling researchers to identify optimal approaches for diverse evaluation criteria. We also identify critical research gaps and propose future directions, including multi-agent collaboration, self-evolving systems, and formal verification integration. This survey serves as a foundational guide for advancing LLM-driven software engineering. We maintain a GitHub repository that continuously updates the reviewed and related papers at https://github.com/lisaGuojl/LLM-Agent-SE-Survey.
翻译:将大语言模型(LLMs)集成到软件工程中,推动了从传统基于规则的系统向能够解决复杂问题的自主代理系统的转变。然而,由于缺乏对基准与解决方案之间相互关联的系统性理解,这一进展受到阻碍。本综述通过首次对大语言模型驱动的软件工程进行整体分析,弥补了这一空白,为评估方法和解决方案范式提供了见解。我们回顾了150多篇近期论文,并提出了一个沿两个关键维度的分类法:(1)解决方案,分为基于提示、基于微调和基于代理的范式;(2)基准,包括代码生成、翻译和修复等任务。我们的分析强调了从简单的提示工程到包含规划、推理、记忆机制和工具增强等能力的复杂代理系统的演进。为了阐明这一进展,我们提出了一个统一的流程,说明了从任务规范到交付成果的工作流程,详细阐述了不同解决方案范式如何应对各种复杂度级别。与先前仅狭隘关注特定方面的综述不同,本研究将50多个基准与其对应的解决策略联系起来,使研究人员能够针对不同的评估标准确定最优方法。我们还指出了关键的研究空白并提出了未来方向,包括多智能体协作、自进化系统以及形式化验证的集成。本综述可作为推进大语言模型驱动软件工程的基础指南。我们在GitHub上维护了一个存储库,持续更新已综述及相关论文,地址为:https://github.com/lisaGuojl/LLM-Agent-SE-Survey。