Generating executable code from natural language instructions using Large Language Models (LLMs) poses challenges such as semantic ambiguity and understanding taskspecific contexts. To address these issues, we propose a system called DemoCraft, which enhances code generation by leveraging in-context learning and demonstration selection, combined with latent concept learning. Latent concept learning introduces additional concept tokens, which are trainable embeddings that capture task-specific knowledge. We then test our system on two major datasets: MBPP and Humaneval. Our experimental results demonstrate that the proposed system achieves an approximate 2x increase in the pass@k metric compared to baseline models. Furthermore, we introduce two novel evaluation metrics: correctness@k and similarity@k. Our empirical studies indicate that our system attains nearly a 3x improvement in these metrics as well.
翻译:使用大型语言模型(LLMs)从自然语言指令生成可执行代码面临着语义模糊性和任务特定上下文理解等挑战。为解决这些问题,我们提出了一个名为DemoCraft的系统,该系统通过结合上下文学习与演示选择,并融合潜在概念学习来增强代码生成能力。潜在概念学习引入了额外的概念标记,这些可训练的嵌入能够捕获任务特定知识。随后,我们在两个主要数据集(MBPP和Humaneval)上测试了我们的系统。实验结果表明,与基线模型相比,所提出的系统在pass@k指标上实现了约2倍的提升。此外,我们引入了两个新颖的评估指标:correctness@k和similarity@k。我们的实证研究表明,我们的系统在这些指标上也取得了近3倍的改进。