Large Language Models demonstrate remarkable mathematical capabilities but at the same time struggle with abstract reasoning and planning. In this study, we explore whether Transformers can learn to abstract and generalize the rules governing Elementary Cellular Automata. By training Transformers on state sequences generated with random initial conditions and local rules, we show that they can generalize across different Boolean functions of fixed arity, effectively abstracting the underlying rules. While the models achieve high accuracy in next-state prediction, their performance declines sharply in multi-step planning tasks without intermediate context. Our analysis reveals that including future states or rule prediction in the training loss enhances the models' ability to form internal representations of the rules, leading to improved performance in longer planning horizons and autoregressive generation. Furthermore, we confirm that increasing the model's depth plays a crucial role in extended sequential computations required for complex reasoning tasks. This highlights the potential to improve LLM with inclusion of longer horizons in loss function, as well as incorporating recurrence and adaptive computation time for dynamic control of model depth.
翻译:大型语言模型展现出卓越的数学能力,但同时在抽象推理与规划方面仍面临挑战。本研究探讨了Transformer是否能够学习并泛化初等元胞自动机的控制规则。通过在随机初始条件与局部规则生成的状态序列上训练Transformer,我们发现模型能够跨不同固定元数的布尔函数进行泛化,从而有效抽象出底层规则。尽管模型在下一状态预测任务中取得了高准确率,但在缺乏中间上下文的多步规划任务中,其性能显著下降。分析表明,在训练损失中加入未来状态或规则预测能够增强模型对规则形成内部表征的能力,从而提升其在长程规划与自回归生成任务中的表现。此外,我们证实增加模型深度对于复杂推理任务所需的扩展序列计算至关重要。这突显了通过在损失函数中纳入更长时域、结合循环机制以及自适应计算时间以实现模型深度的动态控制,从而改进大型语言模型的潜力。