Due to the energy-consumption efficiency between up-slope and down-slope is hugely different, a path with the shortest length on a complex off-road terrain environment (2.5D map) is not always the path with the least energy consumption. For any energy-sensitive vehicles, realizing a good trade-off between distance and energy consumption on 2.5D path planning is significantly meaningful. In this paper, a deep reinforcement learning-based 2.5D multi-objective path planning method (DMOP) is proposed. The DMOP can efficiently find the desired path with three steps: (1) Transform the high-resolution 2.5D map into a small-size map. (2) Use a trained deep Q network (DQN) to find the desired path on the small-size map. (3) Build the planned path to the original high-resolution map using a path enhanced method. In addition, the imitation learning method and reward shaping theory are applied to train the DQN. The reward function is constructed with the information of terrain, distance, border. Simulation shows that the proposed method can finish the multi-objective 2.5D path planning task. Also, simulation proves that the method has powerful reasoning capability that enables it to perform arbitrary untrained planning tasks on the same map.
翻译:由于上坡和下坡之间的能量消耗效率存在巨大差异,在复杂非结构化越野地形环境(2.5D地图)中,最短路径并非总是能耗最低的路径。对于任何对能量敏感的车辆而言,在2.5D路径规划中实现距离与能耗之间的良好折衷具有重大意义。本文提出了一种基于深度强化学习的2.5D多目标路径规划方法(DMOP)。该方法通过三个步骤高效地找到期望路径:(1)将高分辨率2.5D地图转换为小尺寸地图;(2)利用训练好的深度Q网络(DQN)在小尺寸地图上寻找期望路径;(3)采用路径增强方法将规划路径映射回原始高分辨率地图。此外,我们应用了模仿学习方法和奖励塑造理论来训练DQN。奖励函数融合了地形、距离和边界信息。仿真结果表明,所提方法能够完成多目标2.5D路径规划任务。同时,仿真验证了该方法具有强大的推理能力,能够在地图上执行任意未经训练的规划任务。