Differential Dynamic Programming (DDP) is an efficient computational tool for solving nonlinear optimal control problems. It was originally designed as a single shooting method and thus is sensitive to the initial guess supplied. This work considers the extension of DDP to multiple shooting (MS), improving its robustness to initial guesses. A novel derivation is proposed that accounts for the defect between shooting segments during the DDP backward pass, while still maintaining quadratic convergence locally. The derivation enables unifying multiple previous MS algorithms, and opens the door to many smaller algorithmic improvements. A penalty method is introduced to strategically control the step size, further improving the convergence performance. An adaptive merit function and a more reliable acceptance condition are employed for globalization. The effects of these improvements are benchmarked for trajectory optimization with a quadrotor, an acrobot, and a manipulator. MS-DDP is also demonstrated for use in Model Predictive Control (MPC) for dynamic jumping with a quadruped robot, showing its benefits over a single shooting approach.
翻译:微分动态规划(DDP)是求解非线性最优控制问题的高效计算工具。其最初被设计为单打靶法,因而对初始猜测敏感。本文考虑将DDP扩展至多重打靶(MS),以提升其对初始猜测的鲁棒性。提出一种新颖推导方法,在DDP反向传播过程中引入了打靶段之间的缺陷修正,同时仍保持局部二次收敛性。该推导统一了先前多种MS算法,并为众多细粒度算法改进开辟了道路。通过引入惩罚方法策略性地控制步长,进一步提升了收敛性能。采用自适应价值函数与更可靠的接受条件实现全局化策略。通过四旋翼飞行器、Acrobot机械臂与操作臂的轨迹优化问题验证了上述改进的效果。MS-DDP还被应用于四足机器人动态跳跃的模型预测控制(MPC)中,展示了其相对于单打靶法的优势。