Despite some successful applications of goal-driven navigation, existing deep reinforcement learning (DRL)-based approaches notoriously suffers from poor data efficiency issue. One of the reasons is that the goal information is decoupled from the perception module and directly introduced as a condition of decision-making, resulting in the goal-irrelevant features of the scene representation playing an adversary role during the learning process. In light of this, we present a novel Goal-guided Transformer-enabled reinforcement learning (GTRL) approach by considering the physical goal states as an input of the scene encoder for guiding the scene representation to couple with the goal information and realizing efficient autonomous navigation. More specifically, we propose a novel variant of the Vision Transformer as the backbone of the perception system, namely Goal-guided Transformer (GoT), and pre-train it with expert priors to boost the data efficiency. Subsequently, a reinforcement learning algorithm is instantiated for the decision-making system, taking the goal-oriented scene representation from the GoT as the input and generating decision commands. As a result, our approach motivates the scene representation to concentrate mainly on goal-relevant features, which substantially enhances the data efficiency of the DRL learning process, leading to superior navigation performance. Both simulation and real-world experimental results manifest the superiority of our approach in terms of data efficiency, performance, robustness, and sim-to-real generalization, compared with other state-of-the-art (SOTA) baselines. The demonstration video (https://www.youtube.com/watch?v=aqJCHcsj4w0) and the source code (https://github.com/OscarHuangWind/DRL-Transformer-SimtoReal-Navigation) are also provided.
翻译:尽管目标驱动导航取得了一些成功应用,但现有的基于深度强化学习(DRL)的方法普遍存在数据效率低下的问题。其原因之一在于目标信息与感知模块解耦,被直接作为决策条件引入,导致场景表征中的目标无关特征在学习过程中起反作用。为此,我们提出了一种新颖的**目标引导Transformer强化学习方法(GTRL)**,通过将物理目标状态作为场景编码器的输入,引导场景表征与目标信息耦合,从而实现高效自主导航。具体而言,我们提出了一种视觉Transformer的新变体作为感知系统的主干网络,即**目标引导Transformer(GoT)**,并通过专家先验进行预训练以提升数据效率。随后,为决策系统实例化一个强化学习算法,将GoT输出的目标导向场景表征作为输入,生成决策指令。由此,我们的方法促使场景表征聚焦于目标相关特征,显著提升了DRL学习过程的数据效率,进而获得卓越的导航性能。仿真与实际环境实验结果表明,与其它最先进(SOTA)基线方法相比,本方法在数据效率、性能、鲁棒性以及仿真到现实的泛化能力方面均具有优越性。同时提供了演示视频(https://www.youtube.com/watch?v=aqJCHcsj4w0)和源代码(https://github.com/OscarHuangWind/DRL-Transformer-SimtoReal-Navigation)。