In unknown cluttered and dynamic environments such as disaster scenes, mobile robots need to perform target-driven navigation in order to find people or objects of interest, while being solely guided by images of the targets. In this paper, we introduce NavFormer, a novel end-to-end transformer architecture developed for robot target-driven navigation in unknown and dynamic environments. NavFormer leverages the strengths of both 1) transformers for sequential data processing and 2) self-supervised learning (SSL) for visual representation to reason about spatial layouts and to perform collision-avoidance in dynamic settings. The architecture uniquely combines dual-visual encoders consisting of a static encoder for extracting invariant environment features for spatial reasoning, and a general encoder for dynamic obstacle avoidance. The primary robot navigation task is decomposed into two sub-tasks for training: single robot exploration and multi-robot collision avoidance. We perform cross-task training to enable the transfer of learned skills to the complex primary navigation task without the need for task-specific fine-tuning. Simulated experiments demonstrate that NavFormer can effectively navigate a mobile robot in diverse unknown environments, outperforming existing state-of-the-art methods in terms of success rate and success weighted by (normalized inverse) path length. Furthermore, a comprehensive ablation study is performed to evaluate the impact of the main design choices of the structure and training of NavFormer, further validating their effectiveness in the overall system.
翻译:在未知杂乱的动态环境(如灾害现场)中,移动机器人需仅凭目标图像引导,完成目标驱动导航以搜寻感兴趣的人或物体。本文提出NavFormer——一种面向未知与动态环境中机器人目标驱动导航的新型端到端Transformer架构。NavFormer融合了Transformer序列数据处理与自监督学习视觉表征的双重优势,可推理空间布局并在动态场景中实现碰撞规避。该架构创新性地采用双视觉编码器:静态编码器抽取环境不变特征用于空间推理,通用编码器用于动态障碍物规避。主导航任务被分解为单机器人探索与多机器人碰撞规避两个训练子任务,通过跨任务训练使习得技能可直接迁移至复杂主任务,无需任务特定微调。仿真实验表明,NavFormer可在多种未知环境中有效引导移动机器人导航,在成功率及基于(归一化逆)路径长度加权的成功率指标上均超越现有最优方法。此外,通过全面消融研究评估NavFormer结构与训练的主要设计影响,进一步验证了其对整体系统的有效性。