With the advent of large language models (LLMs) in the artificial intelligence (AI) area, the field of software engineering (SE) has also witnessed a paradigm shift. These models, by leveraging the power of deep learning and massive amounts of data, have demonstrated an unprecedented capacity to understand, generate, and operate programming languages. They can assist developers in completing a broad spectrum of software development activities, encompassing software design, automated programming, and maintenance, which potentially reduces huge human efforts. Integrating LLMs within the SE landscape (LLM4SE) has become a burgeoning trend, necessitating exploring this emergent landscape's challenges and opportunities. The paper aims at revisiting the software development life cycle (SDLC) under LLMs, and highlighting challenges and opportunities of the new paradigm. The paper first summarizes the overall process of LLM4SE, and then elaborates on the current challenges based on a through discussion. The discussion was held among more than 20 participants from academia and industry, specializing in fields such as software engineering and artificial intelligence. Specifically, we achieve 26 key challenges from seven aspects, including software requirement & design, coding assistance, testing code generation, code review, code maintenance, software vulnerability management, and data, training, and evaluation. We hope the achieved challenges would benefit future research in the LLM4SE field.
翻译:随着人工智能领域大语言模型的兴起,软件工程领域也经历了一场范式转变。这些模型通过利用深度学习与海量数据,展现出理解、生成和操作编程语言的前所未有的能力。它们能够协助开发者完成广泛的软件开发活动,涵盖软件设计、自动化编程与维护,从而可能大幅减少人力投入。将大语言模型融入软件工程领域已成为新兴趋势,亟需探索这一新兴格局所面临的挑战与机遇。本文旨在重新审视大语言模型背景下的软件开发生命周期,并着重探讨新范式带来的挑战与机遇。文章首先总结了LLM4SE的整体流程,随后基于深入讨论详细阐述了当前面临的挑战。该讨论汇集了来自学术界与产业界的二十余位参与者,其专业领域涵盖软件工程与人工智能。具体而言,我们从软件需求与设计、编码辅助、测试代码生成、代码审查、代码维护、软件漏洞管理以及数据、训练与评估七个方面归纳出26项关键挑战。我们希望所总结的这些挑战能够为LLM4SE领域的未来研究提供有益参考。