Bandit convex optimization (BCO) is a general framework for online decision making under uncertainty. While tight regret bounds for general convex losses have been established, existing algorithms achieving these bounds have prohibitive computational costs for high dimensional data. In this paper, we propose a simple and practical BCO algorithm inspired by the online Newton step algorithm. We show that our algorithm achieves optimal (in terms of horizon) regret bounds for a large class of convex functions that we call $\kappa$-convex. This class contains a wide range of practically relevant loss functions including linear, quadratic, and generalized linear models. In addition to optimal regret, this method is the most efficient known algorithm for several well-studied applications including bandit logistic regression. Furthermore, we investigate the adaptation of our second-order bandit algorithm to online convex optimization with memory. We show that for loss functions with a certain affine structure, the extended algorithm attains optimal regret. This leads to an algorithm with optimal regret for bandit LQR/LQG problems under a fully adversarial noise model, thereby resolving an open question posed in \citep{gradu2020non} and \citep{sun2023optimal}. Finally, we show that the more general problem of BCO with (non-affine) memory is harder. We derive a $\tilde{\Omega}(T^{2/3})$ regret lower bound, even under the assumption of smooth and quadratic losses.
翻译:赌博机凸优化(BCO)是在不确定性下进行在线决策的通用框架。尽管针对一般凸损失函数的紧遗憾界已被建立,但现有达到这些界限的算法在处理高维数据时计算成本过高。本文受在线牛顿步算法启发,提出了一种简单实用的BCO算法。我们证明该算法对于一大类称为$\kappa$-凸的凸函数能实现(在时间范围意义上)最优的遗憾界。此类函数包含大量实际相关的损失函数,包括线性模型、二次模型和广义线性模型。除了最优遗憾性能外,该方法在包括赌博机逻辑回归在内的多个已被深入研究的应用中,是当前已知最高效的算法。此外,我们研究了将二阶赌博机算法扩展到具有记忆的在线凸优化问题。我们证明对于具有特定仿射结构的损失函数,扩展算法能获得最优遗憾。这为完全对抗性噪声模型下的赌博机LQR/LQG问题导出了具有最优遗憾的算法,从而解决了\citep{gradu2020non}和\citep{sun2023optimal}中提出的开放性问题。最后,我们证明具有(非仿射)记忆的BCO这一更普遍的问题更具挑战性。即使在光滑二次损失的假设下,我们推导出了$\tilde{\Omega}(T^{2/3})$的遗憾下界。