We propose a simple but effective modular approach MOPA (Modular ObjectNav with PointGoal agents) to systematically investigate the inherent modularity of the object navigation task in Embodied AI. MOPA consists of four modules: (a) an object detection module trained to identify objects from RGB images, (b) a map building module to build a semantic map of the observed objects, (c) an exploration module enabling the agent to explore the environment, and (d) a navigation module to move to identified target objects. We show that we can effectively reuse a pretrained PointGoal agent as the navigation model instead of learning to navigate from scratch, thus saving time and compute. We also compare various exploration strategies for MOPA and find that a simple uniform strategy significantly outperforms more advanced exploration methods.
翻译:我们提出一种简洁而高效的模块化方法MOPA(基于点目标智能体的模块化物体导航),系统性地探究具身智能中物体导航任务的内在模块化特性。MOPA包含四个模块:(a)物体检测模块,用于从RGB图像中识别物体;(b)地图构建模块,用于建立观测物体的语义地图;(c)探索模块,使智能体能够探索环境;(d)导航模块,用于移动至已识别的目标物体。研究表明,我们可有效复用预训练的点目标智能体作为导航模型,无需从头学习导航能力,从而节省时间与计算资源。此外,我们对比了MOPA的多种探索策略,发现简单的均匀策略显著优于更先进的探索方法。