Optimal control problems can be solved by applying the Pontryagin maximum principle and then solving for a Hamiltonian dynamical system. In this paper, we propose novel learning frameworks to tackle optimal control problems. By applying the Pontryagin maximum principle to the original optimal control problem, the learning focus shifts to reduced Hamiltonian dynamics and corresponding adjoint variables. The reduced Hamiltonian networks can be learned by going backward in time and then minimizing loss function deduced from the Pontryagin maximum principle's conditions. The learning process is further improved by progressively learning a posterior distribution of reduced Hamiltonians, utilizing a variational autoencoder which leads to more effective path exploration process. We apply our learning frameworks to control tasks and obtain competitive results.
翻译:最优控制问题可通过应用庞特里亚金最大值原理并求解哈密顿动力系统来解决。本文提出新颖的学习框架以处理最优控制问题。通过将庞特里亚金最大值原理应用于原始最优控制问题,学习焦点转向降阶哈密顿动力学及相应的伴随变量。降阶哈密顿网络可通过时间反向传播并最小化由庞特里亚金最大值原理条件推导的损失函数来学习。进一步改进学习过程,我们利用变分自编码器逐步学习降阶哈密顿量的后验分布,从而实现更高效的路径探索。将所提学习框架应用于控制任务,取得了具有竞争力的结果。