Due to the vastly different energy consumption between up-slope and down-slope, a path with the shortest length on a complex off-road terrain environment (2.5D map) is not always the path with the least energy consumption. For any energy-sensitive vehicle, realizing a good trade-off between distance and energy consumption in 2.5D path planning is significantly meaningful. In this paper, we propose a deep reinforcement learning-based 2.5D multi-objective path planning method (DMOP). The DMOP can efficiently find the desired path in three steps: (1) Transform the high-resolution 2.5D map into a small-size map. (2) Use a trained deep Q network (DQN) to find the desired path on the small-size map. (3) Build the planned path to the original high-resolution map using a path-enhanced method. In addition, the hybrid exploration strategy and reward shaping theory are applied to train the DQN. The reward function is constructed with the information of terrain, distance, and border. Simulation results show that the proposed method can finish the multi-objective 2.5D path planning task with significantly high efficiency. With similar planned paths, the speed of the proposed method is more than 100 times faster than that of the A* method and 30 times faster than that of H3DM method. Also, simulation proves that the method has powerful reasoning capability that enables it to perform arbitrary untrained planning tasks.
翻译:摘要:由于上坡与下坡的能量消耗差异巨大,在复杂越野地形环境(2.5D地图)中,最短路径并非总是能耗最低的路径。对于任何对能量敏感的车辆而言,在2.5D路径规划中实现距离与能耗的良好权衡具有重要意义。本文提出一种基于深度强化学习的2.5D多目标路径规划方法(DMOP)。该方法通过三个步骤高效地找到期望路径:(1) 将高分辨率2.5D地图转换为小尺寸地图;(2) 利用训练好的深度Q网络(DQN)在小尺寸地图上寻找期望路径;(3) 采用路径增强方法将规划路径构建至原始高分辨率地图。此外,我们应用混合探索策略和奖励塑形理论来训练DQN。奖励函数结合了地形、距离和边界信息。仿真结果表明,所提方法能够以极高的效率完成多目标2.5D路径规划任务。在规划路径相似的情况下,所提方法的速度比A*方法快100倍以上,比H3DM方法快30倍。同时,仿真证明该方法具有强大的推理能力,能够执行任意未经训练的规划任务。