Large language models (LLMs) have revolutionized code generation, significantly enhancing developer productivity. However, for a vast number of users with minimal coding knowledge, LLMs provide little support, as they primarily generate isolated code snippets rather than complete, large-scale project code. Without coding expertise, these users struggle to interpret, modify, and iteratively refine the outputs of LLMs, making it impossible to assemble a complete project. To address this issue, we propose Self-Rectified Large-Scale Code Generator (SRLCG), a framework that generates complete multi-file project code from a single prompt. SRLCG employs a novel multidimensional chain-of-thought (CoT) and self-rectification to guide LLMs in generating correct and robust code files, then integrates them into a complete and coherent project using our proposed dynamic backtracking algorithm. Experimental results show that SRLCG generates code 15x longer than DeepSeek-V3, 16x longer than GPT-4, and at least 10x longer than other leading CoT-based baselines. Furthermore, they confirm its improved correctness, robustness, and performance compared to baselines in large-scale code generation.
翻译:大型语言模型(LLMs)已彻底改变代码生成领域,显著提升了开发者的生产效率。然而,对于大量仅具备基础编程知识的用户而言,LLMs提供的支持极为有限,因其主要生成孤立的代码片段而非完整的大规模项目代码。缺乏编程专业知识的用户难以解读、修改及迭代优化LLMs的输出,导致无法组装完整项目。为解决此问题,我们提出自修正大规模代码生成器(SRLCG),该框架能够通过单一提示生成完整的多文件项目代码。SRLCG采用创新的多维思维链(CoT)与自修正机制引导LLMs生成正确且健壮的代码文件,随后通过我们提出的动态回溯算法将其整合为完整且连贯的项目。实验结果表明,SRLCG生成的代码长度达到DeepSeek-V3的15倍、GPT-4的16倍,并至少超过其他主流基于CoT的基线方法10倍。此外,实验结果证实了其在大规模代码生成任务中相较于基线方法在正确性、健壮性与性能方面的显著提升。