Recent advancement in online optimization and control has provided novel tools to study online linear quadratic regulator (LQR) problems, where cost matrices are varying adversarially over time. However, the controller parameterization of existing works may not satisfy practical conditions like sparsity due to physical connections. In this work, we study online linear quadratic Gaussian problems with a given linear constraint imposed on the controller. Inspired by the recent work of [1] which proposed, for a linearly constrained policy optimization of an offline LQR, a second order method equipped with a Riemannian metric that emerges naturally in the context of optimal control problems, we propose online optimistic Newton on manifold (OONM) which provides an online controller based on the prediction on the first and second order information of the function sequence. To quantify the proposed algorithm, we leverage the notion of regret defined as the sub-optimality of its cumulative cost to that of a (locally) minimizing controller sequence and provide the regret bound in terms of the path-length of the minimizer sequence. Simulation results are also provided to verify the property of OONM.
翻译:近期在线优化与控制领域的研究进展为研究成本矩阵随时间对抗性变化的线性二次型调节器(LQR)问题提供了新工具。然而,现有工作中的控制器参数化可能无法满足诸如物理连接导致的稀疏性等实际约束条件。本文研究受控器上施加给定线性约束的线性二次型高斯问题。受文献[1]启发(该文献针对离线LQR的线性约束策略优化问题,提出了一种配备黎曼度量的二阶方法,该度量在最优控制问题中自然产生),我们提出流形在线乐观牛顿法(OONM),该方法利用函数序列的一阶与二阶信息预测来构建在线控制器。为量化所提算法,我们采用遗憾概念(定义为累计成本相对于(局部)最小化控制器序列累计成本的次优性),并基于最小化序列路径长度给出遗憾界。通过仿真实验验证了OONM方法的特性。