While Large Language Models (LLMs) excel in reasoning, whether they can sustain persistent latent states remains under-explored. The capacity to maintain and manipulate unexpressed, internal representations-analogous to human working memory-is a cornerstone of complex reasoning. In this paper, we formalize and quantify the "Latent State Persistence" (LSP) gap through three novel experiments. First, we utilize a Number Guessing Game, demonstrating that across independent queries, LLMs fail to allocate probability mass to a singular hidden choice, violating a fundamental probabilistic principle. Second, we employ a Yes-No Game to show that as the number of questions increases, LLMs suffer from "concept drift," leading to inevitable self-contradictions due to the lack of LSP. Finally, inspired by Mathematical Mentalism, we task models with tracking transformations on hidden variables, revealing a failure in variable binding and state evolution when the initial state is not explicitly present in the context. Collectively, these findings suggest that LLMs function as reactive post-hoc solvers rather than proactive planners with LSP. Our work provides a framework for evaluating the fidelity of internal representations and highlights a fundamental architectural divergence between autoregressive transformers and human-like cognition.
翻译:尽管大型语言模型(LLMs)在推理方面表现出色,但其能否维持持久的潜在状态仍有待深入探索。维持和操纵未表达的内部表征——类似于人类工作记忆的能力——是复杂推理的基石。本文通过三个新颖的实验形式化并量化了“潜在状态持久性”(LSP)的缺失。首先,我们利用数字猜测游戏证明,在独立查询中,LLMs无法将概率质量分配给单一隐藏选择,这违反了一项基本的概率原则。其次,我们采用是-否游戏表明,随着问题数量的增加,LLMs会遭受“概念漂移”,由于缺乏LSP而导致不可避免的自相矛盾。最后,受数学心灵主义的启发,我们让模型跟踪隐藏变量的变换,揭示了当初始状态未在上下文中明确呈现时,模型在变量绑定和状态演化方面存在失败。总体而言,这些发现表明,LLMs的功能更类似于反应式的事后求解器,而非具有LSP的主动规划者。我们的工作为评估内部表征的保真度提供了一个框架,并突显了自回归Transformer与类人认知之间根本性的架构差异。