Humans rapidly learn abstract knowledge when encountering novel environments and flexibly deploy this knowledge to guide efficient and intelligent action. Can modern AI systems learn and plan in a similar way? We study this question using a dataset of complex human gameplay with concurrent fMRI recordings, in which participants learn novel video games that require rule discovery, hypothesis revision, and multi-step planning. We jointly evaluate models by their ability to play the games, match human learning behavior, and predict brain activity during the same task, comparing a suite of frontier Large Reasoning Models (LRMs) against model-free and model-based deep reinforcement learning agents and a Bayesian theory-based agent. We find that frontier LRMs most closely match human behavioral patterns during game discovery and predict brain activity an order of magnitude better than both reinforcement learning alternatives across cortical and subcortical regions, with effects robust to permutation controls. Through targeted manipulations, we further show that brain alignment reflects the model's in-context representation of the game state rather than its downstream planning or reasoning. Our results establish LRMs as compelling computational accounts of human learning and decision making in complex, naturalistic environments. Project page with interactive replays: https://botcs.github.io/reason-to-play/
翻译:人类在遭遇新环境时能快速习得抽象知识,并灵活运用这些知识指导高效智能的行动。现代AI系统能否以类似方式学习与规划?我们利用包含同步功能磁共振成像记录的复杂人类游戏行为数据集研究该问题,该数据集中参与者需学习涉及规则发现、假设修正及多步骤规划的新型电子游戏。通过联合评估模型在玩游戏、匹配人类学习行为及预测任务中脑活动三方面的能力,我们比较了一系列前沿大型推理模型与无模型/基于模型的深度强化学习智能体及贝叶斯理论智能体。研究发现前沿LRM在游戏探索阶段最接近人类行为模式,且对皮层及皮层下脑区活动的预测能力较两类强化学习替代模型高出一个数量级,该效应经置换检验验证具有稳健性。通过针对性操控实验,我们进一步证明脑对齐反映的是模型对游戏状态的上下文表征,而非其下游规划或推理过程。本研究确立了LRM作为人类在复杂自然环境中学习与决策过程可信计算模型的地位。含交互回放的项目页面:https://botcs.github.io/reason-to-play/