In the context of visual navigation, the capacity to map a novel environment is necessary for an agent to exploit its observation history in the considered place and efficiently reach known goals. This ability can be associated with spatial reasoning, where an agent is able to perceive spatial relationships and regularities, and discover object characteristics. Recent work introduces learnable policies parametrized by deep neural networks and trained with Reinforcement Learning (RL). In classical RL setups, the capacity to map and reason spatially is learned end-to-end, from reward alone. In this setting, we introduce supplementary supervision in the form of auxiliary tasks designed to favor the emergence of spatial perception capabilities in agents trained for a goal-reaching downstream objective. We show that learning to estimate metrics quantifying the spatial relationships between an agent at a given location and a goal to reach has a high positive impact in Multi-Object Navigation settings. Our method significantly improves the performance of different baseline agents, that either build an explicit or implicit representation of the environment, even matching the performance of incomparable oracle agents taking ground-truth maps as input. A learning-based agent from the literature trained with the proposed auxiliary losses was the winning entry to the Multi-Object Navigation Challenge, part of the CVPR 2021 Embodied AI Workshop.
翻译:在视觉导航中,智能体在到达新环境时必须具备构建地图的能力,以便有效利用在该位置的观测历史并高效抵达已知目标。这种能力与空间推理相关,即智能体能够感知空间关系与规律,并发现物体特征。现有研究提出了由深度神经网络参数化、并通过强化学习训练的可学习策略。在经典强化学习框架中,地图构建与空间推理能力仅通过奖励信号进行端到端学习。针对这一设定,我们引入辅助任务形式的额外监督,旨在促进面向下游目标达成任务的智能体中空间感知能力的涌现。研究表明,学习估计智能体当前位置与待达目标之间空间关系的度量指标,在多目标导航场景中具有显著正向作用。我们的方法能有效提升不同基线智能体的性能——无论其构建的是显式还是隐式环境表征——甚至能与以真实地图为输入的不可比神谕智能体性能匹敌。采用所提辅助损失函数训练的文献基准学习型智能体,在CVPR 2021具身AI研讨会多目标导航挑战赛中荣获冠军。