In this paper, we tackle the problem of Unmanned Aerial (UA V) path planning in complex and uncertain environments by designing a Model Predictive Control (MPC), based on a Long-Short-Term Memory (LSTM) network integrated into the Deep Deterministic Policy Gradient algorithm. In the proposed solution, LSTM-MPC operates as a deterministic policy within the DDPG network, and it leverages a predicting pool to store predicted future states and actions for improved robustness and efficiency. The use of the predicting pool also enables the initialization of the critic network, leading to improved convergence speed and reduced failure rate compared to traditional reinforcement learning and deep reinforcement learning methods. The effectiveness of the proposed solution is evaluated by numerical simulations.
翻译:本文针对复杂不确定环境下的无人机路径规划问题,设计了一种基于长短时记忆网络的模型预测控制方法,并将其集成至深度确定性策略梯度算法中。所提方案中,LSTM-MPC作为DDPG网络内的确定性策略运行,通过构建预测池存储预测的未来状态与动作,从而提升鲁棒性与效率。预测池的引入还实现了评论家网络的初始化,相比传统强化学习及深度强化学习方法,有效提高了收敛速度并降低了失败率。通过数值仿真验证了所提方案的有效性。