Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes

Humans and animals have a rich and flexible understanding of the physical world, which enables them to infer the underlying dynamical trajectories of objects and events, plausible future states, and use that to plan and anticipate the consequences of actions. However, the neural mechanisms underlying these computations are unclear. We combine a goal-driven modeling approach with dense neurophysiological data and high-throughput human behavioral readouts to directly impinge on this question. Specifically, we construct and evaluate several classes of sensory-cognitive networks to predict the future state of rich, ethologically-relevant environments, ranging from self-supervised end-to-end models with pixel-wise or object-centric objectives, to models that future predict in the latent space of purely static image-based or dynamic video-based pretrained foundation models. We find strong differentiation across these model classes in their ability to predict neural and behavioral data both within and across diverse environments. In particular, we find that neural responses are currently best predicted by models trained to predict the future state of their environment in the latent space of pretrained foundation models optimized for dynamic scenes in a self-supervised manner. Notably, models that future predict in the latent space of video foundation models that are optimized to support a diverse range of sensorimotor tasks, reasonably match both human behavioral error patterns and neural dynamics across all environmental scenarios that we were able to test. Overall, these findings suggest that the neural mechanisms and behaviors of primate mental simulation are thus far most consistent with being optimized to future predict on dynamic, reusable visual representations that are useful for embodied AI more generally.

翻译：人类和动物对物理世界具有丰富而灵活的理解能力，这使其能够推断物体与事件的潜在动态轨迹、预判可能的未来状态，并据此规划行动、预测行为后果。然而，支撑这些计算的神经机制尚不明确。我们采用目标驱动建模方法，结合密集的神经生理数据与高通量人类行为读数，直接探讨该问题。具体而言，我们构建并评估了多类感觉-认知网络，以预测与生态学高度相关的复杂环境的未来状态——涵盖从基于像素级或物体级目标的自监督端到端模型，到在纯静态图像或动态视频预训练基础模型的潜在空间中进行未来预测的模型。我们发现，这些模型类别在预测跨环境及环境内神经与行为数据的能力上存在显著分化。尤其值得注意的是，当前最佳预测神经响应的模型，是那些在针对动态场景进行自监督优化的预训练基础模型的潜在空间中，训练以预测环境未来状态的模型。特别地，在支持多样化感觉运动任务的视频基础模型潜在空间中执行未来预测的模型，能合理匹配我们所有可测试环境场景中的人类行为错误模式与神经动态。总体而言，这些发现表明，灵长类动物心理模拟的神经机制与行为，目前最符合以动态、可复用的视觉表征为对象进行未来预测的优化目标——这类表征广义上有益于具身人工智能系统。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日