Path planning plays a pivotal role in automated parking, yet current methods struggle to efficiently handle the intricate and diverse parking scenarios. One potential solution is the reinforcement learning-based method, leveraging its exploration in unrecorded situations. However, a key challenge lies in training reinforcement learning methods is the inherent randomness in converging to a feasible policy. This paper introduces a novel solution, the Hybrid POlicy Path plannEr (HOPE), which integrates a reinforcement learning agent with Reeds-Shepp curves, enabling effective planning across diverse scenarios. The paper presents a method to calculate and implement an action mask mechanism in path planning, significantly boosting the efficiency and effectiveness of reinforcement learning training. A transformer is employed as the network structure to fuse environmental information and generate planned paths. To facilitate the training and evaluation of the proposed planner, we propose a criterion for categorizing the difficulty level of parking scenarios based on space and obstacle distribution. Experimental results demonstrate that our approach outperforms typical rule-based algorithms and traditional reinforcement learning methods, showcasing higher planning success rates and generalization across various scenarios. The code for our solution will be openly available on \href{GitHub}{https://github.com/jiamiya/HOPE}. % after the paper's acceptance.
翻译:路径规划在自动泊车中起着关键作用,然而现有方法难以高效处理复杂多变的泊车场景。基于强化学习的方法利用其在未记录场景中的探索能力,可能成为潜在解决方案。但训练强化学习方法面临的核心挑战在于其收敛到可行策略时固有的随机性。本文提出一种新颖的解决方案——混合策略路径规划器(HOPE),该方法将强化学习智能体与Reeds-Shepp曲线相结合,从而能够在多样化场景中进行有效规划。本文提出一种在路径规划中计算并实施动作掩码机制的方法,显著提升了强化学习训练的效率和效果。采用Transformer作为网络架构以融合环境信息并生成规划路径。为促进所提规划器的训练与评估,我们提出一种基于空间与障碍物分布的泊车场景难度分级标准。实验结果表明,该方法优于典型的基于规则的算法和传统强化学习方法,在不同场景中展现出更高的规划成功率和泛化能力。本方案的代码将在\href{GitHub}{https://github.com/jiamiya/HOPE}公开提供。% 论文录用后