Generative Software Engineering

The rapid development of deep learning techniques, improved computational power, and the availability of vast training data have led to significant advancements in pre-trained models and large language models (LLMs). Pre-trained models based on architectures such as BERT and Transformer, as well as LLMs like ChatGPT, have demonstrated remarkable language capabilities and found applications in Software engineering. Software engineering tasks can be divided into many categories, among which generative tasks are the most concern by researchers, where pre-trained models and LLMs possess powerful language representation and contextual awareness capabilities, enabling them to leverage diverse training data and adapt to generative tasks through fine-tuning, transfer learning, and prompt engineering. These advantages make them effective tools in generative tasks and have demonstrated excellent performance. In this paper, we present a comprehensive literature review of generative tasks in SE using pre-trained models and LLMs. We accurately categorize SE generative tasks based on software engineering methodologies and summarize the advanced pre-trained models and LLMs involved, as well as the datasets and evaluation metrics used. Additionally, we identify key strengths, weaknesses, and gaps in existing approaches, and propose potential research directions. This review aims to provide researchers and practitioners with an in-depth analysis and guidance on the application of pre-trained models and LLMs in generative tasks within SE.

翻译：深度学习技术的迅速发展、计算能力的提升以及大量训练数据的可用性，推动了预训练模型和大语言模型的重大进展。基于BERT和Transformer等架构的预训练模型，以及ChatGPT等大语言模型，展现了卓越的语言能力，并在软件工程领域得到应用。软件工程任务可细分为多种类别，其中生成式任务最受研究者关注。预训练模型和大语言模型具备强大的语言表征和上下文感知能力，能够利用多样化的训练数据，通过微调、迁移学习和提示工程适应生成式任务。这些优势使其成为生成式任务中的有效工具，并展现出优异性能。本文对软件工程领域中使用预训练模型和大语言模型的生成式任务进行了系统性文献综述。我们根据软件工程方法学对生成式任务进行了精确分类，总结了所涉及的高级预训练模型和大语言模型、使用的数据集及评估指标。此外，我们识别了现有方法的关键优势、不足与局限性，并提出了潜在的研究方向。本综述旨在为研究人员和实践者提供关于预训练模型和大语言模型在软件工程生成式任务中应用的深入分析与指导。

相关内容

Engineering

关注 7

《工程》是中国工程院（CAE）于2015年推出的国际开放存取期刊。其目的是提供一个高水平的平台，传播和分享工程研发的前沿进展、当前主要研究成果和关键成果；报告工程科学的进展，讨论工程发展的热点、兴趣领域、挑战和前景，在工程中考虑人与环境的福祉和伦理道德，鼓励具有深远经济和社会意义的工程突破和创新，使之达到国际先进水平，成为新的生产力，从而改变世界，造福人类，创造新的未来。期刊链接：https://www.sciencedirect.com/journal/engineering

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日