Linear Quadratic Regulator (LQR) and Linear Quadratic Gaussian (LQG) control are foundational and extensively researched problems in optimal control. We investigate LQR and LQG problems with semi-adversarial perturbations and time-varying adversarial bandit loss functions. The best-known sublinear regret algorithm of~\cite{gradu2020non} has a $T^{\frac{3}{4}}$ time horizon dependence, and its authors posed an open question about whether a tight rate of $\sqrt{T}$ could be achieved. We answer in the affirmative, giving an algorithm for bandit LQR and LQG which attains optimal regret (up to logarithmic factors) for both known and unknown systems. A central component of our method is a new scheme for bandit convex optimization with memory, which is of independent interest.
翻译:线性二次型调节器(LQR)与线性二次型高斯(LQG)控制是最优控制领域中基础且被广泛研究的问题。本文研究存在半对抗扰动和时变对抗性赌博损失函数的LQR及LQG问题。~\cite{gradu2020non}提出的已知最优次线性遗憾算法具有$T^{\frac{3}{4}}$的时间复杂度依赖,该文献作者曾提出开放式问题:是否能够实现$\sqrt{T}$的紧致下界?我们给出肯定回答,针对已知和未知系统分别提出LQR和LQG的赌博算法,其遗憾值(对数因子范围内)均达到最优。本方法的核心组件是一种具有记忆效应的新型赌博凸优化方案,该方案本身具有独立研究意义。