This paper brings the concept of "optimism" to the new and promising framework of online Non-stochastic Control (NSC). Namely, we study how can NSC benefit from a prediction oracle of unknown quality responsible for forecasting future costs. The posed problem is first reduced to an optimistic learning with delayed feedback problem, which is handled through the Optimistic Follow the Regularized Leader (OFTRL) algorithmic family. This reduction enables the design of OptFTRL-C, the first Disturbance Action Controller (DAC) with optimistic policy regret bounds. These new bounds are commensurate with the oracle's accuracy, ranging from $\mathcal{O}(1)$ for perfect predictions to the order-optimal $\mathcal{O}(\sqrt{T})$ even when all predictions fail. By addressing the challenge of incorporating untrusted predictions into control systems, our work contributes to the advancement of the NSC framework and paves the way towards effective and robust learning-based controllers.
翻译:本文将“乐观”概念引入在线非随机控制(NSC)这一新兴且前景广阔的研究框架。具体而言,我们研究NSC如何从预测质量未知的未来成本预测预言机中获益。首先将所提出的问题转化为带延迟反馈的乐观学习问题,并通过乐观跟随正则化领导(OFTRL)算法族来处理该问题。这一转化使得我们能够设计出OptFTRL-C,即首个具有乐观策略遗憾界的扰动动作控制器(DAC)。这些新边界与预言机的精度相匹配——在完美预测情况下可达$\mathcal{O}(1)$,即使在所有预测均失效时也能达到阶最优的$\mathcal{O}(\sqrt{T})$。通过解决将不可信预测融入控制系统的挑战,我们的工作推动了NSC框架的发展,并为构建高效鲁棒的基于学习的控制器铺平了道路。