This paper proposes a new method for differentiating through optimal trajectories arising from non-convex, constrained discrete-time optimal control (COC) problems using the implicit function theorem (IFT). Previous works solve a differential Karush-Kuhn-Tucker (KKT) system for the trajectory derivative, and achieve this efficiently by solving an auxiliary Linear Quadratic Regulator (LQR) problem. In contrast, we directly evaluate the matrix equations which arise from applying variable elimination on the Lagrange multiplier terms in the (differential) KKT system. By appropriately accounting for the structure of the terms within the resulting equations, we show that the trajectory derivatives scale linearly with the number of timesteps. Furthermore, our approach allows for easy parallelization, significantly improved scalability with model size, direct computation of vector-Jacobian products and improved numerical stability compared to prior works. As an additional contribution, we unify prior works, addressing claims that computing trajectory derivatives using IFT scales quadratically with the number of timesteps. We evaluate our method on a both synthetic benchmark and four challenging, learning from demonstration benchmarks including a 6-DoF maneuvering quadrotor and 6-DoF rocket powered landing.
翻译:本文提出了一种利用隐函数定理(IFT)对非凸约束离散时间最优控制(COC)问题中的最优轨迹进行微分的新方法。以往的研究通过求解微分Karush-Kuhn-Tucker(KKT)系统来获得轨迹导数,并借助辅助线性二次型调节器(LQR)问题高效实现。与此不同,本文直接评估在(微分)KKT系统中对拉格朗日乘子项进行变量消元后得到的矩阵方程。通过适当考虑所得方程中各子项的结构,我们证明轨迹导数的计算量与时间步数呈线性关系。此外,与以往工作相比,我们的方法易于并行化、显著提升了对模型规模的扩展性、能够直接计算向量-雅可比积,并具有更优的数值稳定性。作为额外贡献,我们统一了现有文献中的相关方法,澄清了关于使用IFT计算轨迹导数会导致时间步数二次增长的论断。我们在合成基准以及包括六自由度机动四旋翼飞行器和六自由度火箭动力着陆在内的四个具有挑战性的模仿学习基准上评估了该方法。