Code generation is to automatically generate source code conforming to a given programming specification, which has received extensive attention especially with the development of large language models (LLMs). Due to the inherent difficulty of code generation, the code generated by LLMs may be also not aligned with the specification. To improve the perfor mance of LLMs in code generation, some Chain of Thought (CoT) techniques have been proposed to guide LLMs for programming understanding before code generation. However, they are still hard to figure out complicated programming logic according to the (concise) specification, leadingto unsatisfactory code generation performance. In this work, we propose the first test-case-driven CoT technique, called TCoT, to further enhance the ability of LLMs in code generation. It understands the programming specification from the novel perspective of test cases, which is aligned with human practice by using examples to understand complicated problems. Due to the existence of the expected output specified in a test case, TCoT can instantly check the correctness of the programming understanding and then refine it to be as correct as possible before code generation. In this way, it is more likely to generate correct code. Our evaluation on 6 datasets and 14 baselines demonstrates the effectiveness of TCoT. For example, TCoT improves ChatGPT by 13.93%~69.44% in terms of Pass@1 (measuring the ratio of programming problems for which the generated code passes all test cases), and outperforms the existing CoT technique with the improvement of 12.14%~53.72% in terms of Pass@1.
翻译:代码生成是指根据给定的编程规范自动生成符合规范的源代码,这一领域随着大型语言模型(LLMs)的发展而受到广泛关注。由于代码生成固有的难度,LLMs生成的代码可能仍与规范存在偏差。为了提升LLMs在代码生成中的性能,已有研究提出一些思维链(Chain of Thought,CoT)技术,在代码生成前引导LLMs进行编程理解。然而,这些方法仍难以根据(简明的)规范理清复杂的编程逻辑,导致代码生成性能不理想。本文提出首个基于测试用例的CoT技术,称为TCoT,以进一步增强LLMs的代码生成能力。该技术从测试用例这一新颖视角理解编程规范,这与人类通过示例理解复杂问题的实践一致。由于测试用例中包含了预期输出,TCoT能够即时检验编程理解的正确性,并在代码生成前将其尽可能修正至正确。通过这种方式,更有可能生成正确的代码。我们在6个数据集和14个基线上的评估验证了TCoT的有效性。例如,在Pass@1指标(衡量生成代码通过所有测试用例的编程问题占比)上,TCoT使ChatGPT提升了13.93%~69.44%,并比现有CoT技术提升了12.14%~53.72%。