Since the introduction of Large Language Models (LLMs), they have been widely adopted for various tasks such as text summarization, question answering, speech-to-text translation, and more. In recent times, the use of LLMs for code generation has gained significant attention, with tools such as Cursor and Windsurf demonstrating the ability to analyze massive code repositories and recommend relevant changes. Big tech companies have also acknowledged the growing reliance on LLMs for code generation within their codebases. Although these advances significantly improve developer productivity, increasing reliance on automated code generation can proportionally increase the risk of suboptimal solutions and insecure code. Our work focuses on automatically sampling In-Context Learning (ICL) demonstrations which can improve model performance and enhance the interpretability of the generated code. Using AST-based analysis on outputs from the MBPP test set, we identify regions of code most influenced by the chosen demonstrations. In our experiments, we show that high-quality ICL demonstrations not only make outputs easier to interpret but also yield a positive performance improvement on the pass@10 metric. Conversely, poorly chosen ICL demonstrations affected the LLM performance on the pass@10 metric negatively compared to the base model. Overall, our approach highlights the importance of efficient sampling strategies for ICL, which can affect the performance of the model on any given task.
翻译:自大语言模型(LLMs)问世以来,其已被广泛应用于文本摘要、问答、语音转文本翻译等多种任务。近年来,利用LLMs进行代码生成受到广泛关注,诸如Cursor和Windsurf等工具已展现出分析海量代码仓库并推荐相关修改的能力。大型科技公司也意识到其代码库中对LLM代码生成的依赖日益增长。尽管这些进展显著提升了开发效率,但过度依赖自动化代码生成会相应增加次优解决方案和不安全代码的风险。本研究致力于自动采样上下文学习(ICL)示例,以提升模型性能并增强生成代码的可解释性。通过对MBPP测试集输出进行基于抽象语法树(AST)的分析,我们识别出受所选示例影响最显著的代码区域。实验表明,高质量的ICL示例不仅使输出更易于解释,还能在pass@10指标上带来正向性能提升。相反,与基础模型相比,选择不当的ICL示例会对LLM在pass@10指标上的表现产生负面影响。总体而言,我们的方法凸显了高效ICL采样策略的重要性,该策略可影响模型在任意任务上的性能表现。