When researchers claim AI systems possess ToM or mental models, they are fundamentally discussing behavioral predictions and bias corrections rather than genuine mental states. This position paper argues that the current discourse conflates sophisticated pattern matching with authentic cognition, missing a crucial distinction between simulation and experience. While recent studies show LLMs achieving human-level performance on ToM laboratory tasks, these results are based only on behavioral mimicry. More importantly, the entire testing paradigm may be flawed in applying individual human cognitive tests to AI systems, but assessing human cognition directly in the moment of human-AI interaction. I suggest shifting focus toward mutual ToM frameworks that acknowledge the simultaneous contributions of human cognition and AI algorithms, emphasizing the interaction dynamics, instead of testing AI in isolation.
翻译:当研究者声称人工智能系统具备心智理论或心理模型时,他们实质上是在讨论行为预测与偏差校正,而非真正的心理状态。本立场论文认为,当前的话语体系将复杂的模式匹配与真实的认知活动混为一谈,忽视了模拟过程与经验体验之间的根本区别。虽然最新研究表明大语言模型在心智理论实验室任务中已达到人类水平的表现,但这些结果完全基于行为模仿。更为关键的是,将个体人类认知测试直接应用于人工智能系统的整个测试范式可能存在根本缺陷——既未能触及人类认知与AI交互时即时发生的认知过程。本文建议将研究焦点转向交互式心智理论框架,该框架承认人类认知与AI算法在交互过程中的共同贡献机制,强调动态交互过程本身,而非孤立地测试AI系统。