This work addresses the fundamental problem of unbounded metric movement costs in bandit online convex optimization, by considering high-dimensional dynamic quadratic hitting costs and $\ell_2$-norm switching costs in a noisy bandit feedback model. For a general class of stochastic environments, we provide the first algorithm SCaLE that provably achieves a distribution-agnostic sub-linear dynamic regret, without the knowledge of hitting cost structure. En-route, we present a novel spectral regret analysis that separately quantifies eigenvalue-error driven regret and eigenbasis-perturbation driven regret. Extensive numerical experiments, against online-learning baselines, corroborate our claims, and highlight statistical consistency of our algorithm.
翻译:本研究针对带噪老虎机反馈模型中高维动态二次命中成本与$\ell_2$范数切换成本,解决了老虎机在线凸优化中无界度量移动成本这一基础问题。针对一类广义随机环境,我们首次提出SCaLE算法,该算法在未知命中成本结构的情况下,可证明实现分布无关的次线性动态遗憾。在研究过程中,我们提出了一种新颖的谱遗憾分析方法,分别量化了特征值误差驱动的遗憾与特征基扰动驱动的遗憾。大量数值实验通过与在线学习基线的对比,验证了我们的理论主张,并凸显了该算法的统计一致性。