We study the contextual continuum bandits problem, where the learner sequentially receives a side information vector and has to choose an action in a convex set, minimizing a function associated to the context. The goal is to minimize all the underlying functions for the received contexts, leading to a dynamic (contextual) notion of regret, which is stronger than the standard static regret. Assuming that the objective functions are H\"older with respect to the contexts, we demonstrate that any algorithm achieving a sub-linear static regret can be extended to achieve a sub-linear dynamic regret. We further study the case of strongly convex and smooth functions when the observations are noisy. Inspired by the interior point method and employing self-concordant barriers, we propose an algorithm achieving a sub-linear dynamic regret. Lastly, we present a minimax lower bound, implying two key facts. First, no algorithm can achieve sub-linear dynamic regret over functions that are not continuous with respect to the context. Second, for strongly convex and smooth functions, the algorithm that we propose achieves, up to a logarithmic factor, the minimax optimal rate of dynamic regret as a function of the number of queries.
翻译:我们研究了上下文连续体赌博机问题,其中学习者顺序接收侧信息向量,并必须在凸集中选择一个动作,以最小化与上下文相关联的函数。目标是最小化所有接收上下文对应的底层函数,从而引出一个动态(上下文相关的)遗憾概念,其比标准静态遗憾更强。假设目标函数相对于上下文是H\"older连续的,我们证明了任何实现次线性静态遗憾的算法均可扩展以实现次线性动态遗憾。我们进一步研究了当观测存在噪声时强凸且光滑函数的情况。受内点法启发并采用自协调障碍函数,我们提出了一种实现次线性动态遗憾的算法。最后,我们提出了一个极小极大下界,其蕴含两个关键事实。首先,对于不关于上下文连续的函数,任何算法都无法实现次线性动态遗憾。其次,对于强凸且光滑函数,我们所提出的算法在达到对数因子范围内,实现了作为查询次数函数的动态遗憾的极小极大最优速率。