In today's society, we are becoming increasingly dependent on software systems. However, we also constantly witness the negative impacts of buggy software. Program synthesis aims to improve software correctness by automatically generating the program given an outline of the expected behavior. For decades, program synthesis has been an active research field, with recent approaches looking to incorporate Large Language Models to help generate code. This paper explores the concept of LLM4TDD, where we guide Large Language Models to generate code iteratively using a test-driven development methodology. We conduct an empirical evaluation using ChatGPT and coding problems from LeetCode to investigate the impact of different test, prompt and problem attributes on the efficacy of LLM4TDD.
翻译:在当今社会,我们越来越依赖软件系统。然而,我们也不断目睹有缺陷软件带来的负面影响。程序合成旨在通过根据预期行为大纲自动生成程序来提高软件的正确性。数十年来,程序合成一直是一个活跃的研究领域,近期的方法尝试引入大型语言模型来辅助代码生成。本文探索LLM4TDD的概念,即通过测试驱动开发方法论引导大型语言模型迭代生成代码。我们利用ChatGPT和LeetCode编程问题进行实证评估,研究不同测试、提示和问题属性对LLM4TDD效能的影响。