Regret-Optimal LQR Control

We consider the infinite-horizon LQR control problem. Motivated by competitive analysis in online learning, as a criterion for controller design we introduce the dynamic regret, defined as the difference between the LQR cost of a causal controller (that has only access to past disturbances) and the LQR cost of the \emph{unique} clairvoyant one (that has also access to future disturbances) that is known to dominate all other controllers. The regret itself is a function of the disturbances, and we propose to find a causal controller that minimizes the worst-case regret over all bounded energy disturbances. The resulting controller has the interpretation of guaranteeing the smallest regret compared to the best non-causal controller that can see the future. We derive explicit formulas for the optimal regret and for the regret-optimal controller for the state-space setting. These explicit solutions are obtained by showing that the regret-optimal control problem can be reduced to a Nehari extension problem that can be solved explicitly. The regret-optimal controller is shown to be linear and can be expressed as the sum of the classical $H_2$ state-feedback law and an $n$-th order controller ($n$ is the state dimension), and its construction simply requires a solution to the standard LQR Riccati equation and two Lyapunov equations. Simulations over a range of plants demonstrate that the regret-optimal controller interpolates nicely between the $H_2$ and the $H_\infty$ optimal controllers, and generally has $H_2$ and $H_\infty$ costs that are simultaneously close to their optimal values. The regret-optimal controller thus presents itself as a viable option for control systems design.

翻译：我们考虑无限时域线性二次型调节器（LQR）控制问题。受在线学习中竞争性分析的启发，我们引入动态遗憾作为控制器设计准则，将其定义为因果控制器（仅能获取历史扰动信息）与唯一具有预见能力的非因果控制器（能同时获取未来扰动信息且被证明支配所有其他控制器）的LQR成本之差。遗憾本身是扰动的函数，我们提出寻找能在所有有界能量扰动下最小化最坏情况遗憾的因果控制器。所得控制器可解释为保证与能预见未来的最优非因果控制器相比具有最小遗憾。我们推导了状态空间设定下最优遗憾及遗憾最小化控制器的显式公式。这些显式解通过证明遗憾最小化控制问题可归结为可显式求解的Nehari扩展问题而获得。研究表明，遗憾最小化控制器是线性的，可表示为经典$H_2$状态反馈律与$n$阶控制器（$n$为状态维数）之和，其构造仅需求解标准LQR Riccati方程和两个Lyapunov方程。在多种被控对象上的仿真表明，遗憾最小化控制器能良好地介于$H_2$与$H_\infty$最优控制器之间，其$H_2$与$H_\infty$成本通常同时接近各自最优值。因此，遗憾最小化控制器可作为控制系统设计的可行方案。