Route planning is essential to mobile robot navigation problems. In recent years, deep reinforcement learning (DRL) has been applied to learning optimal planning policies in stochastic environments without prior knowledge. However, existing works focus on learning policies that maximize the expected return, the performance of which can vary greatly when the level of stochasticity in the environment is high. In this work, we propose a distributional reinforcement learning based framework that learns return distributions which explicitly reflect environmental stochasticity. Policies based on the second-order stochastic dominance (SSD) relation can be used to make adjustable route decisions according to user preference on performance robustness. Our proposed method is evaluated in a simulated road network environment, and experimental results show that our method is able to plan the shortest routes that minimize stochasticity in travel time when robustness is preferred, while other state-of-the-art DRL methods are agnostic to environmental stochasticity.
翻译:路径规划是移动机器人导航中的核心问题。近年来,深度强化学习被应用于在无先验知识的随机环境中学习最优规划策略。然而,现有研究主要关注最大化期望回报的策略学习,当环境随机性较高时,其性能可能产生显著波动。本文提出一种基于分布强化学习的框架,通过学习显式反映环境随机性的回报分布,并基于二阶随机占优关系制定策略。该策略可根据用户对性能鲁棒性的偏好,实现可调节的路径决策。在模拟路网环境中的实验结果表明:当偏好鲁棒性时,本方法能规划出行时间随机性最小化的最短路径,而其他前沿深度强化学习方法对环境随机性不具备感知能力。