Animal navigation research posits that organisms build and maintain internal spatial representations, or maps, of their environment. We ask if machines -- specifically, artificial intelligence (AI) navigation agents -- also build implicit (or 'mental') maps. A positive answer to this question would (a) explain the surprising phenomenon in recent literature of ostensibly map-free neural-networks achieving strong performance, and (b) strengthen the evidence of mapping as a fundamental mechanism for navigation by intelligent embodied agents, whether they be biological or artificial. Unlike animal navigation, we can judiciously design the agent's perceptual system and control the learning paradigm to nullify alternative navigation mechanisms. Specifically, we train 'blind' agents -- with sensing limited to only egomotion and no other sensing of any kind -- to perform PointGoal navigation ('go to $\Delta$ x, $\Delta$ y') via reinforcement learning. Our agents are composed of navigation-agnostic components (fully-connected and recurrent neural networks), and our experimental setup provides no inductive bias towards mapping. Despite these harsh conditions, we find that blind agents are (1) surprisingly effective navigators in new environments (~95% success); (2) they utilize memory over long horizons (remembering ~1,000 steps of past experience in an episode); (3) this memory enables them to exhibit intelligent behavior (following walls, detecting collisions, taking shortcuts); (4) there is emergence of maps and collision detection neurons in the representations of the environment built by a blind agent as it navigates; and (5) the emergent maps are selective and task dependent (e.g. the agent 'forgets' exploratory detours). Overall, this paper presents no new techniques for the AI audience, but a surprising finding, an insight, and an explanation.
翻译:动物导航研究表明,生物体会构建并维持其环境的内部空间表征(即地图)。我们探究机器——特别是人工智能(AI)导航智能体——是否也会构建隐式(或“心理”)地图。对此问题的肯定回答将:(a)解释近期文献中一个令人惊讶的现象,即看似无地图的神经网络取得了强大性能;(b)强化地图作为智能具身智能体(无论是生物体还是人工体)基本导航机制的证据。与动物导航不同,我们可以审慎设计智能体的感知系统并控制学习范式,以消除其他导航机制的影响。具体而言,我们训练“盲”智能体——其感知仅局限于自我运动,无任何其他类型的感知——通过强化学习执行点目标导航(“前往$\Delta$ x, $\Delta$ y”)。我们的智能体由导航无关组件(全连接和循环神经网络)构成,且实验设置不提供任何朝向地图构建的归纳偏置。尽管条件严苛,我们发现盲智能体:(1)在新环境中具有惊人的导航有效性(~95%成功率);(2)利用长期记忆(在单次回合中记住约1000步的过往经验);(3)该记忆使其展现智能行为(沿墙行走、检测碰撞、抄近路);(4)随着盲智能体导航,其构建的环境表征中涌现出地图和碰撞检测神经元;(5)涌现的地图具有选择性和任务依赖性(例如,智能体会“遗忘”探索性绕行)。总体而言,本文并未向AI界提出新技术,而是呈现一个令人惊讶的发现、深刻洞见及解释。