In this paper, we analyze the problem of online convex optimization in different settings. We show that any algorithm for online linear optimization with fully adaptive adversaries is an algorithm for online convex optimization. We also show that any such algorithm that requires full-information feedback may be transformed to an algorithm with semi-bandit feedback with comparable regret bound. We further show that algorithms that are designed for fully adaptive adversaries using deterministic semi-bandit feedback can obtain similar bounds using only stochastic semi-bandit feedback when facing oblivious adversaries. We use this to describe general meta-algorithms to convert first order algorithms to zeroth order algorithms with comparable regret bounds. Our framework allows us to analyze online optimization in various settings, such full-information feedback, bandit feedback, stochastic regret, adversarial regret and various forms of non-stationary regret.
翻译:在本文中,我们分析了不同设置下的在线凸优化问题。我们证明,任何针对完全自适应对手的在线线性优化算法也是在线凸优化算法。我们还证明,任何需要全信息反馈的此类算法都可以转化为半强盗反馈算法,且具有可比较的遗憾界。我们进一步证明,针对完全自适应对手设计的确定性半强盗反馈算法,在面对非适应性对手时,仅需使用随机半强盗反馈即可获得类似的界。基于此,我们描述了将一阶算法转化为零阶算法的通用元算法,且具有可比较的遗憾界。我们的框架能够分析多种设置下的在线优化问题,包括全信息反馈、老虎机反馈、随机遗憾、对抗性遗憾以及各种形式的非平稳遗憾。