Executing computer programs described in natural language has long been a pursuit of computer science. With the advent of enhanced natural language understanding capabilities exhibited by large language models (LLMs), the path toward this goal has been illuminated. In this paper, we seek to examine the capacity of present-day LLMs to comprehend and execute algorithms outlined in natural language. We established an algorithm test set sourced from Introduction to Algorithm, a well-known textbook that contains many representative widely-used algorithms. To systematically assess LLMs' code execution abilities, we selected 30 algorithms, generated 300 random-sampled instances in total, and evaluated whether popular LLMs can understand and execute these algorithms. Our findings reveal that LLMs, notably GPT-4, can effectively execute programs described in natural language, as long as no heavy numeric computation is involved. We believe our findings contribute to evaluating LLMs' code execution abilities and would encourage further investigation and application for the computation power of LLMs.
翻译:用自然语言描述并执行计算机程序一直是计算机科学追求的目标。随着大语言模型(LLM)展示出增强的自然语言理解能力,实现这一目标的路径已被照亮。本文旨在探究当前LLM理解并执行自然语言描述算法的能力。我们建立了源自《算法导论》(Introduction to Algorithm)的算法测试集,该教科书包含众多具有代表性的广泛使用的算法。为系统评估LLM的代码执行能力,我们选取了30个算法,生成了总计300个随机采样实例,并考察了主流LLM能否理解并执行这些算法。研究结果表明,只要不涉及重度数值计算,LLM(尤其是GPT-4)可以有效执行自然语言描述的程序。我们相信,本研究成果有助于评估LLM的代码执行能力,并将激励对LLM计算能力的进一步研究与应用。