In this paper, we analyze the problem of online convex optimization in different settings. We show that any algorithm for online linear optimization with fully adaptive adversaries is an algorithm for online convex optimization. We also show that any such algorithm that requires full-information feedback may be transformed to an algorithm with semi-bandit feedback with comparable regret bound. We further show that algorithms that are designed for fully adaptive adversaries using deterministic semi-bandit feedback can obtain similar bounds using only stochastic semi-bandit feedback when facing oblivious adversaries. We use this to describe general meta-algorithms to convert first order algorithms to zeroth order algorithms with comparable regret bounds. Our framework allows us to analyze online optimization in various settings, such full-information feedback, bandit feedback, stochastic regret, adversarial regret and various forms of non-stationary regret. Using our analysis, we provide the first efficient projection-free online convex optimization algorithm using linear optimization oracles.
翻译:本文分析了不同设置下的在线凸优化问题。我们证明,任何针对完全自适应对手的在线线性优化算法均可用于在线凸优化;同时,任何需要全信息反馈的此类算法均可转化为具有可比遗憾界的半强盗反馈算法。进一步表明,针对完全自适应对手且使用确定性半强盗反馈设计的算法,在面对不知情对手时,仅需利用随机半强盗反馈即可获得类似边界。基于此,我们提出通用元算法,将一阶算法转化为具有可比遗憾界的零阶算法。我们的框架能够分析多种设置下的在线优化问题,包括全信息反馈、强盗反馈、随机遗憾、对抗遗憾及多种形式的非平稳遗憾。通过该分析,我们首次提出了基于线性优化预言机的高效无投影在线凸优化算法。