Gradient-variation online learning has drawn increasing attention due to its deep connections to game theory, optimization, etc. It has been studied extensively in the full-information setting, but is underexplored with bandit feedback. In this work, we focus on gradient variation in Bandit Convex Optimization (BCO) with two-point feedback. By proposing a refined analysis on the non-consecutive gradient variation, a fundamental quantity in gradient variation with bandits, we improve the dimension dependence for both convex and strongly convex functions compared with the best known results (Chiang et al., 2013). Our improved analysis for the non-consecutive gradient variation also implies other favorable problem-dependent guarantees, such as gradient-variance and small-loss regrets. Beyond the two-point setup, we demonstrate the versatility of our technique by achieving the first gradient-variation bound for one-point bandit linear optimization over hyper-rectangular domains. Finally, we validate the effectiveness of our results in more challenging tasks such as dynamic/universal regret minimization and bandit games, establishing the first gradient-variation dynamic and universal regret bounds for two-point BCO and fast convergence rates in bandit games.
翻译:梯度变化在线学习因其与博弈论、优化等领域的深刻联系而日益受到关注。该问题已在完全信息设定下得到广泛研究,但在赌博反馈设定下探索不足。本文聚焦于两点反馈的赌博凸优化中的梯度变化问题。通过对非连续梯度变化(赌博设定下梯度变化的一个基本量)提出精细化分析,我们针对凸函数和强凸函数,改进了相较于已知最佳结果(Chiang 等人,2013)的维度依赖性。我们对非连续梯度变化的改进分析也意味着其他有利的问题依赖性保证,例如梯度方差和小损失遗憾界。在两点设定之外,我们通过为超矩形域上的一点赌博线性优化首次实现梯度变化界,展示了所提技术的普适性。最后,我们在更具挑战性的任务(如动态/通用遗憾最小化和赌博博弈)中验证了结果的有效性,为两点赌博凸优化建立了首个梯度变化动态与通用遗憾界,并在赌博博弈中实现了快速收敛率。