The Multi-Object Navigation (MultiON) task requires a robot to localize an instance (each) of multiple object classes. It is a fundamental task for an assistive robot in a home or a factory. Existing methods for MultiON have viewed this as a direct extension of Object Navigation (ON), the task of localising an instance of one object class, and are pre-sequenced, i.e., the sequence in which the object classes are to be explored is provided in advance. This is a strong limitation in practical applications characterized by dynamic changes. This paper describes a deep reinforcement learning framework for sequence-agnostic MultiON based on an actor-critic architecture and a suitable reward specification. Our framework leverages past experiences and seeks to reward progress toward individual as well as multiple target object classes. We use photo-realistic scenes from the Gibson benchmark dataset in the AI Habitat 3D simulation environment to experimentally show that our method performs better than a pre-sequenced approach and a state of the art ON method extended to MultiON.
翻译:多目标导航任务要求机器人定位多个目标类别的每个实例,这是家庭或工厂辅助机器人的基础任务。现有方法将其视为单目标导航的直接扩展,即定位单一目标类别的实例,且采用预定义顺序——目标类别探索序列需提前设定。这在具有动态变化的实际应用中存在重大局限。本文提出基于演员-评论家架构和适当奖励设定的深度强化学习框架,用于解决与序列无关的多目标导航问题。该框架利用历史经验,并对接近单一及多目标类别的进展给予奖励。通过在AI Habitat 3D仿真环境中使用Gibson基准数据集的光真实场景开展实验,我们证明该方法优于预定义序列方法及扩展至多目标导航的最先进单目标导航方法。