Towards Enhancing In-Context Learning for Code Generation

In-context learning (ICL) with pre-trained language models (PTLMs) has shown great success in code generation. ICL does not require training. PTLMs take as the input a prompt consisting of a few requirement-code examples and a new requirement, and output a new program. However, existing studies simply reuse ICL techniques for natural language generation and ignore unique features of code generation. We refer to these studies as standard ICL. Inspired by observations of the human coding process, we propose a novel ICL approach for code generation named AceCoder. Compared to standard ICL, AceCoder has two novelties. (1) Example retrieval. It retrieves similar programs as examples and learns programming skills (e.g., algorithms, APIs) from them. (2) Guided Code Generation. It encourages PTLMs to output an intermediate preliminary (e.g., test cases, APIs) before generating programs. The preliminary can help PTLMs understand requirements and guide the next code generation. We apply AceCoder to six PTLMs (e.g., Codex) and evaluate it on three public benchmarks using the Pass@k. Results show that AceCoder can significantly improve the performance of PTLMs on code generation. (1) In terms of Pass@1, AceCoder outperforms standard ICL by up to 79.7% and fine-tuned models by up to 171%. (2) AceCoder is effective in PTLMs with different sizes (e.g., 1B to 175B) and different languages (e.g., Python, Java, and JavaScript). (3) We investigate multiple choices of the intermediate preliminary. (4) We manually evaluate generated programs in three aspects and prove the superiority of AceCoder. (5) Finally, we discuss some insights about ICL for practitioners.

翻译：上下文学习（ICL）与预训练语言模型（PTLM）已在代码生成中展现出巨大成功。ICL无需训练，PTLM以包含若干需求-代码示例和新需求的提示作为输入，并输出新程序。然而，现有研究简单复用自然语言生成的ICL技术，忽视了代码生成的独有特征，我们称此类研究为标准ICL。受人类编码过程的观察启发，我们提出了一种名为AceCoder的新型代码生成ICL方法。与标准ICL相比，AceCoder具有两大创新：（1）示例检索：它检索相似程序作为示例，并从中学习编程技能（如算法、API）；（2）引导式代码生成：它引导PTLM在生成程序前输出中间预备产物（如测试用例、API），帮助PTLM理解需求并指导后续代码生成。我们将AceCoder应用于六种PTLM（如Codex），并在三个公开基准上使用Pass@k进行评估。结果表明，AceCoder能显著提升PTLM在代码生成上的性能：（1）在Pass@1指标上，AceCoder相较于标准ICL最高提升79.7%，相较于微调模型最高提升171%；（2）AceCoder对不同规模（如1B至175B参数）和不同编程语言（如Python、Java和JavaScript）的PTLM均有效；（3）我们探究了多种中间预备产物的选择方案；（4）我们从三个方面人工评估生成程序，证明了AceCoder的优越性；（5）最后，我们为从业者梳理了关于ICL的若干见解。