This paper proposes a new method for differentiating through optimal trajectories arising from non-convex, constrained discrete-time optimal control (COC) problems using the implicit function theorem (IFT). Previous works solve a differential Karush-Kuhn-Tucker (KKT) system for the trajectory derivative, and achieve this efficiently by solving an auxiliary Linear Quadratic Regulator (LQR) problem. In contrast, we directly evaluate the matrix equations which arise from applying variable elimination on the Lagrange multiplier terms in the (differential) KKT system. By appropriately accounting for the structure of the terms within the resulting equations, we show that the trajectory derivatives scale linearly with the number of timesteps. Furthermore, our approach allows for easy parallelization, significantly improved scalability with model size, direct computation of vector-Jacobian products and improved numerical stability compared to prior works. As an additional contribution, we unify prior works, addressing claims that computing trajectory derivatives using IFT scales quadratically with the number of timesteps. We evaluate our method on a both synthetic benchmark and four challenging, learning from demonstration benchmarks including a 6-DoF maneuvering quadrotor and 6-DoF rocket powered landing.
翻译:本文提出了一种新方法,利用隐函数定理(IFT)对非凸约束离散时间最优控制(COC)问题产生的最优轨迹进行微分。先前研究通过求解微分Karush-Kuhn-Tucker(KKT)系统来获取轨迹导数,并借助求解辅助线性二次型调节器(LQR)问题高效实现这一过程。相比之下,我们直接评估通过对(微分)KKT系统中拉格朗日乘子项进行变量消去所得的矩阵方程。通过恰当考虑所得方程中各项的结构,我们证明轨迹导数随时间步数线性缩放。此外,我们的方法便于并行化、显著提升基于模型规模的扩展性、可直接计算向量-雅可比积,并在数值稳定性上优于先前工作。作为额外贡献,我们统一了先前研究,解决了关于利用IFT计算轨迹导数随时间步数平方缩放的问题。我们在合成基准测试以及四个具有挑战性的示教学习基准测试(包括六自由度机动四旋翼飞行器和六自由度火箭动力着陆)上评估了该方法。