Adaptive cognition requires structured internal models representing objects and their relations. Predictive neural networks are often proposed to form such "world models", yet their underlying mechanisms remain unclear. One hypothesis is that action-conditioned sequential prediction suffices for learning such world models. In this work, we investigate this possibility in a minimal in-silico setting. Sequentially sampling tokens from 2D continuous token scenes, a recurrent neural network is trained to predict the upcoming token from current input and a saccade-like displacement. On novel scenes, prediction accuracy improves across the sequence, indicating in-context learning. Decoding analyses reveal path integration and dynamic binding of token identity to position. Interventional analyses show that new bindings can be learned late in sequence and that out-of-distribution bindings can be learned. Together, these results demonstrate how structured representations that rely on flexible binding emerge to support prediction, offering a mechanistic account of sequential world modeling relevant to cognitive science.
翻译:适应性认知需要能够表征对象及其关系的结构化内部模型。预测性神经网络常被提出用于构建此类"世界模型",但其底层机制仍不明确。一种假说认为,动作条件化的序列预测足以学习此类世界模型。在本研究中,我们通过最小化的硅基环境探究这种可能性。通过从二维连续标记场景中顺序采样标记,我们训练循环神经网络根据当前输入和类似眼跳的位移来预测即将出现的标记。在新场景中,预测精度随序列推进而提升,表明存在上下文学习。解码分析揭示了路径整合以及标记身份与位置的动态绑定。干预分析表明,新的绑定可以在序列后期被学习,且分布外的绑定也能够被习得。这些结果共同证明了依赖灵活绑定的结构化表征如何涌现以支持预测,为认知科学相关的序列世界建模提供了机制性解释。