Large Language Models (LLMs) can generate functional source code from natural-language prompts, but often fail to consistently follow higher-level architectural structures or design patterns. Since LLMs are increasingly used in software engineering, their ability to apply established design principles to generated code is crucial to the long-term success of software products. Therefore, the goal of this paper is to identify strategies for guiding LLMs to incorporate design patterns into the generated source code. We designed a computational experiment to evaluate the ability of 13 LLMs to generate code that follows the Singleton design pattern, using four prompting strategies: instructions, binary automated feedback, extensive automated feedback, and extensive feedback with few-shot prompts, in 164 Java coding challenges from HumanEval-X. Our results shows that the optimal strategy to guide LLMs to include design patterns depends heavily on the type of model. Still, overall, iterative binary feedback provides the best alignment with Singleton while preserving or improving the code's functionality. With guiding with instructions, Llama 3.3 generated Singleton classes in 100% of cases and improved code functionality, increasing the number of tests passed by 34.1 percentage points. It achieved a similar result with guidance through instructions and binary feedback. Qwen 3 (8B) increased the alignment with Singleton to 99.2% and the functionality to 58.6% using binary feedback. Our result suggests that even simple strategies can be used to guide LLMs to use design patterns.
翻译:大型语言模型(LLMs)能够从自然语言提示中生成可运行的源代码,但往往无法始终遵循高层级的架构结构或设计模式。由于LLMs在软件工程中的应用日益广泛,它们将既定设计原则应用于生成代码的能力对软件产品的长期成功至关重要。因此,本文旨在识别引导LLMs将设计模式融入生成源代码的策略。我们设计了一项计算实验,评估13个LLMs在遵循单例设计模式生成代码方面的能力,采用四种提示策略:指令、二元自动反馈、全面自动反馈以及带少样本示例的全面反馈,实验基于HumanEval-X中的164个Java编码挑战。结果表明,引导LLMs融入设计模式的最优策略在很大程度上依赖于模型类型。尽管如此,总体而言,迭代二元反馈在保持或改善代码功能的同时,提供了与单例模式的最佳对齐效果。通过指令引导,Llama 3.3在100%的案例中生成了单例类,并提升了代码功能,通过的测试数量增加了34.1个百分点。在通过指令与二元反馈引导时,它实现了类似的结果。Qwen 3(8B)使用二元反馈将单例模式对齐率提升至99.2%,功能通过率提升至58.6%。我们的研究结果表明,即使简单的策略也可用于引导LLMs使用设计模式。