Readable Minds: Emergent Theory-of-Mind-Like Behavior in LLM Poker Agents

Theory of Mind (ToM) -- the ability to model others' mental states -- is fundamental to human social cognition. Whether large language models (LLMs) can develop ToM has been tested exclusively through static vignettes, leaving open whether ToM-like reasoning can emerge through dynamic interaction. Here we report that autonomous LLM agents playing extended sessions of Texas Hold'em poker progressively develop sophisticated opponent models, but only when equipped with persistent memory. In a 2x2 factorial design crossing memory (present/absent) with domain knowledge (present/absent), each with five replications (N = 20 experiments, ~6,000 agent-hand observations), we find that memory is both necessary and sufficient for ToM-like behavior emergence (Cliff's delta = 1.0, p = 0.008). Agents with memory reach ToM Level 3-5 (predictive to recursive modeling), while agents without memory remain at Level 0 across all replications. Strategic deception grounded in opponent models occurs exclusively in memory-equipped conditions (Fisher's exact p < 0.001). Domain expertise does not gate ToM-like behavior emergence but enhances its application: agents without poker knowledge develop equivalent ToM levels but less precise deception (p = 0.004). Agents with ToM deviate from game-theoretically optimal play (67% vs. 79% TAG adherence, delta = -1.0, p = 0.008) to exploit specific opponents, mirroring expert human play. All mental models are expressed in natural language and directly readable, providing a transparent window into AI social cognition. Cross-model validation with GPT-4o yields weighted Cohen's kappa = 0.81 (almost perfect agreement). These findings demonstrate that functional ToM-like behavior can emerge from interaction dynamics alone, without explicit training or prompting, with implications for understanding artificial social intelligence and biological social cognition.

翻译：心理理论（Theory of Mind, ToM）——即建模他人心理状态的能力——是人类社会认知的基础。当前对大型语言模型（LLMs）是否具备ToM的测试均通过静态场景材料进行，这留下了类ToM推理能否通过动态交互涌现的问题。本文报告，自主LLM智能体在长时间玩得州扑克游戏时，会逐步发展出复杂的对手模型——但前提是配备持久记忆。通过2×2析因设计（记忆存在/缺失 × 领域知识存在/缺失），各条件重复五次（N=20组实验，约6000次智能体-手牌观察），我们发现记忆对于类ToM行为的涌现既必要又充分（Cliff's δ=1.0, p=0.008）。具有记忆的智能体达到ToM第3-5层级（从预测性建模到递归性建模），而无记忆智能体在所有重复中始终停留在0级。基于对手模型的策略性欺骗仅在配备记忆的条件下出现（Fisher精确检验p<0.001）。领域知识虽不限制类ToM行为涌现，但能增强其应用：未掌握扑克知识的智能体达到同等级别ToM，但其欺骗精确度显著降低（p=0.004）。具有ToM的智能体偏离博弈论最优策略（TAG遵守率67% vs. 79%, δ=-1.0, p=0.008）以利用特定对手弱点，这与人类专家玩家行为一致。所有心理模型均以自然语言表达且直接可读，为AI社会认知提供了透明窗口。跨模型验证（GPT-4o）得到加权Cohen's kappa=0.81（几乎完全一致）。这些发现表明，功能性类ToM行为可仅通过交互动态涌现（无需显式训练或提示），对理解人工社会智能与生物社会认知具有重要启示。