Important tasks such as reasoning and planning are fundamentally algorithmic, meaning that solving them robustly requires acquiring true reasoning or planning algorithms, rather than shortcuts. Large Language Models lack true algorithmic ability primarily because of the limitations of neural network optimization algorithms, their optimization data and optimization objective, but also due to architectural inexpressivity. To solve this, our paper proposes augmenting LLMs with a library of fundamental operations and sophisticated differentiable programs, so that common algorithms do not need to be learned from scratch. We add memory, registers, basic operations, and adaptive recurrence to a transformer architecture built on LLaMA3. Then, we define a method for directly compiling algorithms into a differentiable starting library, which is used natively and propagates gradients for optimization. In this preliminary study, we explore the feasability of augmenting LLaMA3 with a differentiable computer, for instance by fine-tuning small transformers on simple algorithmic tasks with variable computational depth.
翻译:推理与规划等重要任务本质上是算法性的,这意味着要稳健地解决这些问题,需要获得真正的推理或规划算法,而非依赖捷径。大型语言模型缺乏真正的算法能力,主要源于神经网络优化算法、优化数据与优化目标的局限性,同时也受制于架构的表达能力不足。为解决这一问题,本文提出通过基础操作库与复杂可微分程序来增强大型语言模型,使得常见算法无需从头学习。我们在基于LLaMA3构建的Transformer架构中增加了内存、寄存器、基础操作及自适应循环机制。随后,我们定义了一种将算法直接编译为可微分初始库的方法,该库可直接调用并通过梯度传播进行优化。在本初步研究中,我们通过在小规模Transformer上对具有可变计算深度的简单算法任务进行微调,探索了为LLaMA3增强可微分计算单元的可行性。