With the flourishing development of intelligent warehousing systems, the technology of Automated Guided Vehicle (AGV) has experienced rapid growth. Within intelligent warehousing environments, AGV is required to safely and rapidly plan an optimal path in complex and dynamic environments. Most research has studied deep reinforcement learning to address this challenge. However, in the environments with sparse extrinsic rewards, these algorithms often converge slowly, learn inefficiently or fail to reach the target. Random Network Distillation (RND), as an exploration enhancement, can effectively improve the performance of proximal policy optimization, especially enhancing the additional intrinsic rewards of the AGV agent which is in sparse reward environments. Moreover, most of the current research continues to use 2D grid mazes as experimental environments. These environments have insufficient complexity and limited action sets. To solve this limitation, we present simulation environments of AGV path planning with continuous actions and positions for AGVs, so that it can be close to realistic physical scenarios. Based on our experiments and comprehensive analysis of the proposed method, the results demonstrate that our proposed method enables AGV to more rapidly complete path planning tasks with continuous actions in our environments. A video of part of our experiments can be found at https://youtu.be/lwrY9YesGmw.
翻译:随着智能仓储系统的蓬勃发展,自动导引车(AGV)技术经历了快速增长。在智能仓储环境中,AGV需要在复杂动态环境中安全快速地规划最优路径。多数研究采用深度强化学习方法来应对这一挑战。然而,在外部奖励稀疏的环境中,这些算法通常收敛缓慢、学习效率低下或无法到达目标。作为探索增强机制,随机网络蒸馏(RND)能有效提升近端策略优化的性能,特别是在稀疏奖励环境中增强AGV智能体的额外内在奖励。此外,当前多数研究仍使用二维网格迷宫作为实验环境,这些环境复杂度不足且动作集有限。为突破这一局限,我们构建了具有连续动作和位置的AGV路径规划仿真环境,使其更接近真实物理场景。基于对所提方法的实验和综合分析,结果表明我们的方法能使AGV在连续动作环境中更快速地完成路径规划任务。部分实验视频可访问https://youtu.be/lwrY9YesGmw获取。