While Large Language Models (LLMs) excel in reasoning, whether they can sustain persistent latent states remains under-explored. The capacity to maintain and manipulate unexpressed, internal representations-analogous to human working memory-is a cornerstone of complex reasoning. In this paper, we formalize and quantify the "Latent State Persistence" (LSP) gap through three novel experiments. First, we utilize a Number Guessing Game, demonstrating that across independent queries, LLMs fail to allocate probability mass to a singular hidden choice, violating a fundamental probabilistic principle. Second, we employ a Yes-No Game to show that as the number of questions increases, LLMs suffer from "concept drift," leading to inevitable self-contradictions due to the lack of LSP. Finally, inspired by Mathematical Mentalism, we task models with tracking transformations on hidden variables, revealing a failure in variable binding and state evolution when the initial state is not explicitly present in the context. Collectively, these findings suggest that LLMs function as reactive post-hoc solvers rather than proactive planners with LSP. Our work provides a framework for evaluating the fidelity of internal representations and highlights a fundamental architectural divergence between autoregressive transformers and human-like cognition.
翻译:尽管大型语言模型(LLMs)在推理方面表现出色,但其能否维持持续的潜在状态仍有待深入探究。保持并操纵未表达的内部表征——类似于人类工作记忆——是复杂推理的基石。本文通过三项新颖实验,对"潜在状态持续性"(LSP)的缺陷进行了形式化定义与量化分析。首先,我们利用数字猜测游戏证明:在独立查询过程中,LLMs无法将概率质量分配给单一隐藏选项,这违背了基本的概率原理。其次,通过是-否问答游戏,我们发现随着问题数量的增加,LLMs会出现"概念漂移"现象,由于缺乏LSP而导致不可避免的自相矛盾。最后,受数学读心术启发,我们让模型执行对隐藏变量的变换追踪任务,揭示了当初始状态未显式存在于上下文时,模型在变量绑定与状态演化方面存在缺陷。综合而言,这些发现表明LLMs更类似于反应式的后验求解器,而非具有LSP能力的主动规划器。本研究为评估内部表征的保真度提供了框架,并凸显了自回归Transformer架构与类人认知之间存在根本性的结构差异。