Code Simulation Challenges for Large Language Models

We investigate the extent to which Large Language Models (LLMs) can simulate the execution of computer code and algorithms. We begin by looking at straight line programs, and show that current LLMs demonstrate poor performance even with such simple programs -- performance rapidly degrades with the length of code. We then investigate the ability of LLMs to simulate programs that contain critical paths and redundant instructions. We also go beyond straight line program simulation with sorting algorithms and nested loops, and we show the computational complexity of a routine directly affects the ability of an LLM to simulate its execution. We observe that LLMs execute instructions sequentially and with a low error margin only for short programs or standard procedures. LLMs' code simulation is in tension with their pattern recognition and memorisation capabilities: on tasks where memorisation is detrimental, we propose a novel prompting method to simulate code execution line by line. Empirically, our new Chain of Simulation (CoSm) method improves on the standard Chain of Thought prompting approach by avoiding the pitfalls of memorisation.

翻译：我们研究了大型语言模型（LLMs）模拟计算机代码和算法执行的程度。首先从直线型程序入手，发现当前LLMs即使处理这类简单程序也表现不佳——其性能随代码长度增加而迅速下降。接着探讨了LLMs模拟含有关键路径和冗余指令的程序的能力，并进一步研究了排序算法和嵌套循环等超越直线型程序的模拟场景。实验表明，程序的计算复杂度直接影响LLMs模拟执行的能力。我们观察到LLMs仅对短程序或标准流程能实现低错误率的顺序指令执行。LLMs的代码模拟能力与其模式识别和记忆能力存在矛盾：在记忆会干扰任务时，我们提出了一种逐行模拟代码执行的新型提示方法。实验证明，我们的新方法"模拟链"（Chain of Simulation, CoSm）通过规避记忆缺陷，在标准思维链提示方法基础上实现了性能提升。

相关内容

大语言模型

关注 67

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日