This study evaluates the efficiency of code generation by Large Language Models (LLMs) and measures their performance against human-crafted solutions using a dataset from Leetcode. We compare 18 LLMs, considering factors such as model temperature and success rate, and their impact on code performance. This research introduces a novel method for measuring and comparing the speed of LLM-generated code, revealing that LLMs produce code with comparable performance, irrespective of the adopted LLM. We also find that LLMs are capable of generating code that is, on average, more efficient than the code written by humans. The paper further discusses the use of Leetcode as a benchmarking dataset, the limitations imposed by potential data contamination, and the platform's measurement reliability. We believe that our findings contribute to a better understanding of LLM capabilities in code generation and set the stage for future optimizations in the field.
翻译:本研究评估了大型语言模型(LLMs)的代码生成效率,并利用LeetCode数据集将其性能与人工编写的解决方案进行了比较。我们比较了18种LLM,考虑了模型温度、成功率等因素及其对代码性能的影响。本研究提出了一种新颖的方法来测量和比较LLM生成代码的速度,结果表明LLM生成的代码具有可比性能,且与所采用的LLM无关。我们还发现,LLM生成的代码平均而言比人类编写的代码更高效。本文进一步讨论了使用LeetCode作为基准数据集的可行性、潜在数据污染带来的局限性以及该平台测量结果的可靠性。我们相信,这些发现有助于更好地理解LLM在代码生成方面的能力,并为该领域的未来优化奠定了基础。