Robot crowd navigation has been gaining increasing attention and popularity in various practical applications. In existing research, deep reinforcement learning has been applied to robot crowd navigation by training policies in an online mode. However, this inevitably leads to unsafe exploration, and consequently causes low sampling efficiency during pedestrian-robot interaction. To this end, we propose an offline reinforcement learning based robot crowd navigation algorithm by utilizing pre-collected crowd navigation experience. Specifically, this algorithm integrates a spatial-temporal state into implicit Q-Learning to avoid querying out-of-distribution robot actions of the pre-collected experience, while capturing spatial-temporal features from the offline pedestrian-robot interactions. Experimental results demonstrate that the proposed algorithm outperforms the state-of-the-art methods by means of qualitative and quantitative analysis.
翻译:机器人人群导航在各类实际应用中日益受到关注和普及。现有研究中,深度强化学习已通过在线模式训练策略应用于机器人人群导航,但这不可避免地导致不安全探索,并因此在行人-机器人交互过程中造成采样效率低下。为此,我们提出一种基于离线强化学习的机器人人群导航算法,利用预先收集的人群导航经验。具体而言,该算法将时空状态融入隐式Q学习,以避免查询预收集经验中的分布外机器人动作,同时从离线行人-机器人交互中捕获时空特征。实验结果表明,通过定性与定量分析,所提算法优于现有最先进方法。