We tackle the problem of Non-stochastic Control (NSC) with the aim of obtaining algorithms whose policy regret is proportional to the difficulty of the controlled environment. Namely, we tailor the Follow The Regularized Leader (FTRL) framework to dynamical systems by using regularizers that are proportional to the actual witnessed costs. The main challenge arises from using the proposed adaptive regularizers in the presence of a state, or equivalently, a memory, which couples the effect of the online decisions and requires new tools for bounding the regret. Via new analysis techniques for NSC and FTRL integration, we obtain novel disturbance action controllers (DAC) with sub-linear data adaptive policy regret bounds that shrink when the trajectory of costs has small gradients, while staying sub-linear even in the worst case.
翻译:我们针对非随机控制(NSC)问题展开研究,旨在获得策略遗憾与受控环境难度成正比的算法。具体而言,我们通过使用与实际观测成本成比例的正则化器,将跟随正则化领导(FTRL)框架定制化地应用于动态系统。主要挑战在于,在存在状态(即记忆)的情况下使用所提出的自适应正则化器——这种状态耦合了在线决策的影响,需要对遗憾界进行新的建模工具。通过NSC与FTRL集成的新分析技术,我们获得了新型扰动作用控制器(DAC),其具有次线性数据自适应策略遗憾界:当成本轨迹梯度较小时,该界会收缩,即使在最坏情况下仍保持次线性。