Optimal control (OC) algorithms such as Differential Dynamic Programming (DDP) take advantage of the derivatives of the dynamics to efficiently control physical systems. Yet, in the presence of nonsmooth dynamical systems, such class of algorithms are likely to fail due, for instance, to the presence of discontinuities in the dynamics derivatives or because of non-informative gradient. On the contrary, reinforcement learning (RL) algorithms have shown better empirical results in scenarios exhibiting non-smooth effects (contacts, frictions, etc). Our approach leverages recent works on randomized smoothing (RS) to tackle non-smoothness issues commonly encountered in optimal control, and provides key insights on the interplay between RL and OC through the prism of RS methods. This naturally leads us to introduce the randomized Differential Dynamic Programming (R-DDP) algorithm accounting for deterministic but non-smooth dynamics in a very sample-efficient way. The experiments demonstrate that our method is able to solve classic robotic problems with dry friction and frictional contacts, where classical OC algorithms are likely to fail and RL algorithms require in practice a prohibitive number of samples to find an optimal solution.
翻译:最优控制算法(如微分动态规划,DDP)通过利用动力学导数高效控制物理系统。然而,在处理非光滑动力系统时,此类算法因动力学导数存在不连续性或梯度信息缺失而容易失效。相比之下,强化学习算法在存在非光滑效应(如接触、摩擦等)的场景中展现出更优的实证表现。本方法借助随机平滑技术的最新成果,解决最优控制中常见的非光滑难题,并通过随机平滑方法的视角揭示强化学习与最优控制之间的深层联系。这自然引出了我们提出的随机微分动态规划算法,该算法能够以极高的样本效率处理确定性非光滑动力学。实验表明,在存在干摩擦和摩擦接触的经典机器人问题中,本方法能够有效求解,而传统最优控制算法往往失败,强化学习算法则需要消耗海量样本才能找到最优解。