Large Language Models (LLMs) have demonstrated impressive capabilities in understanding and generating codes. Due to these capabilities, many recent methods are proposed to automatically refine the codes with LLMs. However, we should rethink that the refined codes (from LLMs and even humans) are not always more efficient than their original versions. On the other hand, running two different versions of codes and comparing them every time is not ideal and time-consuming. Therefore, in this work, we propose a novel method based on the code language model that is trained to judge the efficiency between two different codes (generated across humans and machines) by either classifying the superior one or predicting the relative improvement. We validate our method on multiple programming languages with multiple refinement steps, demonstrating that the proposed method can effectively distinguish between more and less efficient versions of code.
翻译:大型语言模型(LLMs)在理解和生成代码方面展现出令人印象深刻的能力。基于这些能力,近期提出了许多利用LLMs自动优化代码的方法。然而,我们需要重新审视:优化后的代码(无论是来自LLMs还是人类)并不总是比原始版本更高效。另一方面,每次运行两个不同版本的代码并进行比较既不理想又耗时。因此,在本工作中,我们提出一种基于代码语言模型的新方法,该模型经过训练,能够通过分类更优代码或预测相对改进程度,来判断两种不同代码(由人类或机器生成)之间的效率差异。我们在多种编程语言和多个优化步骤上验证了该方法,证明所提出的方法能有效区分代码的高效与低效版本。