In this paper, we tackle the problem of Unmanned Aerial (UA V) path planning in complex and uncertain environments by designing a Model Predictive Control (MPC), based on a Long-Short-Term Memory (LSTM) network integrated into the Deep Deterministic Policy Gradient algorithm. In the proposed solution, LSTM-MPC operates as a deterministic policy within the DDPG network, and it leverages a predicting pool to store predicted future states and actions for improved robustness and efficiency. The use of the predicting pool also enables the initialization of the critic network, leading to improved convergence speed and reduced failure rate compared to traditional reinforcement learning and deep reinforcement learning methods. The effectiveness of the proposed solution is evaluated by numerical simulations.
翻译:本文针对复杂不确定环境下的无人机(UAV)路径规划问题,设计了一种基于长短期记忆网络(LSTM)的模型预测控制(MPC)方法,并将其集成到深度确定性策略梯度(DDPG)算法中。在提出的解决方案中,LSTM-MPC作为DDPG网络中的确定性策略运行,并利用预测池存储预测的未来状态和动作,以提高鲁棒性和效率。预测池的使用还能初始化评论家网络,相比传统强化学习和深度强化学习方法,可提升收敛速度并降低失败率。通过数值仿真验证了所提方法的有效性。