Large Language Models have significantly advanced the field of code generation, demonstrating the ability to produce functionally correct code snippets. However, advancements in generative AI for code overlook foundational Software Engineering (SE) principles such as modularity, and single responsibility, and concepts such as cohesion and coupling which are critical for creating maintainable, scalable, and robust software systems. These concepts are missing in pipelines that start with pre-training and end with the evaluation using benchmarks. This vision paper argues for the integration of SE knowledge into LLMs to enhance their capability to understand, analyze, and generate code and other SE artifacts following established SE knowledge. The aim is to propose a new direction where LLMs can move beyond mere functional accuracy to perform generative tasks that require adherence to SE principles and best practices. In addition, given the interactive nature of these conversational models, we propose using Bloom's Taxonomy as a framework to assess the extent to which they internalize SE knowledge. The proposed evaluation framework offers a sound and more comprehensive evaluation technique compared to existing approaches such as linear probing. Software engineering native generative models will not only overcome the shortcomings present in current models but also pave the way for the next generation of generative models capable of handling real-world software engineering.
翻译:大型语言模型显著推动了代码生成领域的发展,展现出生成功能正确代码片段的能力。然而,当前面向代码的生成式人工智能进展忽视了模块化、单一职责等基础软件工程原则,以及内聚性与耦合性等对构建可维护、可扩展、健壮软件系统至关重要的概念。这些概念在从预训练开始到基于基准测试评估结束的流程中普遍缺失。本愿景论文主张将软件工程知识整合到大型语言模型中,以增强其遵循既定软件工程知识来理解、分析和生成代码及其他软件工程制品的能力。其目标是提出一个新的发展方向,使大型语言模型能够超越单纯的功能正确性,执行需要遵循软件工程原则与最佳实践的生成任务。此外,鉴于这些对话式模型的交互特性,我们建议采用布鲁姆分类法作为评估框架,以衡量其内化软件工程知识的程度。与线性探测等现有方法相比,所提出的评估框架提供了一种更严谨、更全面的评估技术。软件工程原生的生成模型不仅将克服当前模型存在的缺陷,还将为能够处理现实世界软件工程任务的下一代生成模型铺平道路。