In this paper, we analyze the problem of online convex optimization in different settings, including different feedback types (full-information/semi-bandit/bandit/etc) in either stochastic or non-stochastic setting and different notions of regret (static adversarial regret/dynamic regret/adaptive regret). This is done through a framework which allows us to systematically propose and analyze meta-algorithms for the various settings described above. We show that any algorithm for online linear optimization with deterministic gradient feedback against fully adaptive adversaries is an algorithm for online convex optimization. We also show that any such algorithm that requires full-information feedback may be transformed to an algorithm with semi-bandit feedback with comparable regret bound. We further show that algorithms that are designed for fully adaptive adversaries using deterministic semi-bandit feedback can obtain similar bounds using only stochastic semi-bandit feedback when facing oblivious adversaries. We use this to describe general meta-algorithms to convert first order algorithms to zeroth order algorithms with comparable regret bounds. Our framework allows us to analyze online optimization in various settings, recovers several results in the literature with a simplified proof technique, and provides new results.
翻译:本文分析了在线凸优化在不同设置下的问题,包括随机或非随机设置中的不同反馈类型(全信息/半赌博机/赌博机等)以及不同的遗憾概念(静态对抗性遗憾/动态遗憾/适应性遗憾)。我们通过一个统一框架实现了这一分析,该框架使我们能够系统地为上述各种设置提出并分析元算法。我们证明,任何针对完全自适应对手且具有确定性梯度反馈的在线线性优化算法,均可作为在线凸优化算法使用。我们还表明,任何需要全信息反馈的此类算法,都可以转化为具有可比遗憾界的半赌博机反馈算法。进一步地,我们证明了针对完全自适应对手设计的确定性半赌博机反馈算法,在面对遗忘型对手时仅使用随机半赌博机反馈即可获得相似的遗憾界。基于此,我们提出了通用的元算法,可将一阶算法转化为具有可比遗憾界的零阶算法。我们的框架使得在线优化在各种设置下的分析成为可能,不仅以简化的证明技术复现了文献中的若干结果,还提供了新的理论成果。