Recent years have seen the remarkable capabilities of large language models (LLMs) for code generation. Different from existing work that evaluate the correctness of the code generated by LLMs, we propose to further evaluate its efficiency. More efficient code can lead to higher performance and execution efficiency of programs and software completed by LLM-assisted programming. First, we evaluate the efficiency of the code generated by LLMs on two benchmarks, HumanEval and MBPP. Then, we choose a set of programming problems from the online judge platform LeetCode to conduct a more difficult evaluation. Finally, we explore several prompts that would enable LLMs to generate more efficient code.
翻译:近年来,大语言模型在代码生成方面展现出卓越能力。与现有工作主要评估大语言模型生成代码的正确性不同,我们提出进一步评估其效率。更高效的代码能够提升大语言模型辅助编程所完成的程序及软件的性能与执行效率。首先,我们在HumanEval和MBPP两个基准测试上评估大语言模型生成代码的效率;其次,从在线判题平台LeetCode中选取一组编程问题进行更高难度的评估;最后,我们探索了若干能够使大语言模型生成更高效代码的提示策略。