Test-Case-Driven Programming Understanding in Large Language Models for Better Code Generation

Code generation is to automatically generate source code conforming to a given programming specification, which has received extensive attention especially with the development of large language models (LLMs). Due to the inherent difficulty of code generation, the code generated by LLMs may be not aligned with the specification. To improve the performance of LLMs in code generation, some thought-eliciting prompting techniques have been proposed to guide LLMs for specification understanding. However, it is still hard to produce correct understanding for complicated programming problems, leading to unsatisfactory code generation performance. Also, some feedback-based prompting techniques have been proposed to fix incorrect code using error messages produced by test execution. However, when the generated code deviates significantly from the ground truth, they encounter difficulties in improving performance based on such coarse-grained information. In this work, we propose a novel prompting technique, called {\mu}FiX, to improve the code generation performance of LLMs by devising both sophisticated thought-eliciting prompting and feedback-based prompting and making the first exploration on their synergy. It first exploits test case analysis to obtain specification understanding and enables a self-improvement process to identify and fix the misunderstanding in the thought-eliciting prompting phase. {\mu}FiX further fixes the specification understanding towards the direction reducing the gap between the provided understanding and the actual understanding implicitly utilized by LLMs for code generation in the feedback-based prompting phase. By obtaining as correct understanding as possible with {\mu}FiX, the code generation performance of LLMs can be largely improved.

翻译：代码生成是一种自动生成符合给定编程规范的源代码的技术，随着大语言模型的发展受到广泛关注。由于代码生成本身具有固有难度，大语言模型生成的代码可能无法与规范完全对齐。为提升大语言模型在代码生成中的性能，研究者提出了一些引导式提示技术来帮助模型理解规范。然而，对于复杂的编程问题，模型仍难以产生正确的理解，导致代码生成性能不尽如人意。此外，基于反馈的提示技术利用测试执行产生的错误信息来修正不正确的代码，但当生成代码与真实结果偏差较大时，这类方法基于粗粒度信息难以提升性能。本工作提出一种新颖的提示技术μFiX，通过结合精细的引导式提示与基于反馈的提示，并首次探索二者的协同作用，来提升大语言模型的代码生成性能。该技术首先利用测试用例分析获取规范理解，并通过自改进过程识别和修正引导式提示阶段中的理解偏差。在基于反馈的提示阶段，μFiX进一步修正规范理解，逐步缩小大语言模型在代码生成中显式提供的理解与实际隐含使用的理解之间的差距。通过μFiX尽可能获取正确的理解，大语言模型的代码生成性能得以大幅提升。