Bandit convex optimization (BCO) is a general framework for online decision making under uncertainty. While tight regret bounds for general convex losses have been established, existing algorithms achieving these bounds have prohibitive computational costs for high dimensional data. In this paper, we propose a simple and practical BCO algorithm inspired by the online Newton step algorithm. We show that our algorithm achieves optimal (in terms of horizon) regret bounds for a large class of convex functions that we call $\kappa$-convex. This class contains a wide range of practically relevant loss functions including linear, quadratic, and generalized linear models. In addition to optimal regret, this method is the most efficient known algorithm for several well-studied applications including bandit logistic regression. Furthermore, we investigate the adaptation of our second-order bandit algorithm to online convex optimization with memory. We show that for loss functions with a certain affine structure, the extended algorithm attains optimal regret. This leads to an algorithm with optimal regret for bandit LQR/LQG problems under a fully adversarial noise model, thereby resolving an open question posed in \citep{gradu2020non} and \citep{sun2023optimal}. Finally, we show that the more general problem of BCO with (non-affine) memory is harder. We derive a $\tilde{\Omega}(T^{2/3})$ regret lower bound, even under the assumption of smooth and quadratic losses.
翻译:赌博机凸优化(BCO)是不确定环境下在线决策的通用框架。尽管一般凸损失函数的紧致遗憾界已经建立,但现有实现这些界的算法在高维数据场景下计算成本过高。本文提出一种受在线牛顿步算法启发的简洁实用BCO算法。我们证明该算法对一类称为κ-凸的凸函数族实现了(在时域上)最优遗憾界。这类函数包含线性、二次型及广义线性模型等大量实际相关的损失函数。除最优遗憾外,该方法也是赌博机逻辑回归等多项经典应用中最高效的已知算法。我们进一步研究了该二阶赌博机算法向带记忆在线凸优化的扩展。证明对具有特定仿射结构的损失函数,扩展算法能获得最优遗憾。这实现了完全对抗噪声模型下赌博机LQR/LQG问题的最优遗憾算法,从而解决了\citep{gradu2020non}和\citep{sun2023optimal}提出的开放问题。最后证明更具一般性的带(非仿射)记忆BCO问题更为困难,即使在光滑二次损失函数假设下,我们推导出$\tilde{\Omega}(T^{2/3})$的遗憾下界。