In this paper, we propose novel learning frameworks to tackle optimal control problems by applying the Pontryagin maximum principle and then solving for a Hamiltonian dynamical system. Applying the Pontryagin maximum principle to the original optimal control problem shifts the learning focus to reduced Hamiltonian dynamics and corresponding adjoint variables. Then, the reduced Hamiltonian networks can be learned by going backwards in time and then minimizing loss function deduced from the Pontryagin maximum principle's conditions. The learning process is further improved by progressively learning a posterior distribution of the reduced Hamiltonians. This is achieved through utilizing a variational autoencoder which leads to more effective path exploration process. We apply our learning frameworks called NeuralPMP to various control tasks and obtain competitive results.
翻译:本文提出新型学习框架以解决最优控制问题,通过应用庞特里亚金最大值原理并求解哈密顿动力系统。将庞特里亚金最大值原理应用于原始最优控制问题后,学习焦点转向约化哈密顿动力学及相应的伴随变量。随后,可通过时间反向传播并基于庞特里亚金最大值原理条件推导的损失函数最小化,学习约化哈密顿网络。该学习过程进一步通过逐步学习约化哈密顿量的后验分布得到改进,具体通过采用变分自编码器实现更有效的路径探索过程。我们将所提出的学习框架NeuralPMP应用于多种控制任务,获得了具有竞争力的结果。