Large language models (LLMs) have made significant strides at code generation through improved model design, training, and chain-of-thought. However, prompt-level optimizations remain an important yet under-explored aspect of LLMs for coding. This work focuses on the few-shot examples present in most code generation prompts, offering a systematic study on whether few-shot examples improve LLM's coding capabilities, which few-shot examples have the largest impact, and how to select impactful examples. Our work offers 2 approaches for selecting few-shot examples, a model-free method, CODEEXEMPLAR-FREE, and a model-based method, CODEEXEMPLAR-BASED. The 2 methods offer a trade-off between improved performance and reliance on training data and interpretability. Both methods significantly improve CodeLlama's coding ability across the popular HumanEval+ coding benchmark. In summary, our work provides valuable insights into how to pick few-shot examples in code generation prompts to improve LLM code generation capabilities.
翻译:大语言模型(LLMs)通过改进模型设计、训练和思维链,在代码生成方面取得了显著进展。然而,提示层面的优化仍然是LLMs用于编码的一个重要但尚未充分探索的方面。本研究聚焦于大多数代码生成提示中存在的小样本示例,系统性地探讨了以下几个问题:小样本示例是否能提升LLM的编码能力,哪些小样本示例影响最大,以及如何选择有影响力的示例。我们提出了两种选择小样本示例的方法:一种是无模型方法CODEEXEMPLAR-FREE,另一种是基于模型的方法CODEEXEMPLAR-BASED。这两种方法在性能提升与对训练数据的依赖及可解释性之间提供了权衡。两种方法均在流行的HumanEval+代码基准测试中显著提升了CodeLlama的编码能力。总之,我们的研究为如何在代码生成提示中选择小样本示例以提升LLM的代码生成能力提供了有价值的见解。