We study the contextual continuum bandits problem, where the learner sequentially receives a side information vector and has to choose an action in a convex set, minimizing a function associated to the context. The goal is to minimize all the underlying functions for the received contexts, leading to a dynamic (contextual) notion of regret, which is stronger than the standard static regret. Assuming that the objective functions are H\"older with respect to the contexts, we demonstrate that any algorithm achieving a sub-linear static regret can be extended to achieve a sub-linear dynamic regret. We further study the case of strongly convex and smooth functions when the observations are noisy. Inspired by the interior point method and employing self-concordant barriers, we propose an algorithm achieving a sub-linear dynamic regret. Lastly, we present a minimax lower bound, implying two key facts. First, no algorithm can achieve sub-linear dynamic regret over functions that are not continuous with respect to the context. Second, for strongly convex and smooth functions, the algorithm that we propose achieves, up to a logarithmic factor, the minimax optimal rate of dynamic regret as a function of the number of queries.
翻译:我们研究上下文连续赌博机问题,其中学习器顺序接收辅助信息向量,并必须在凸集中选择一个动作,以最小化与上下文相关联的函数。目标是最小化所有接收上下文对应的底层函数,这引出了比标准静态遗憾更强的动态(上下文相关)遗憾概念。假设目标函数相对于上下文是赫尔德连续的,我们证明任何实现次线性静态遗憾的算法均可扩展以实现次线性动态遗憾。我们进一步研究了观测存在噪声时强凸且光滑函数的情况。受内点法和自协调障碍函数的启发,我们提出了一种实现次线性动态遗憾的算法。最后,我们给出了一个极小极大下界,该下界隐含两个关键事实:第一,对于不随上下文连续的函数,任何算法均无法实现次线性动态遗憾;第二,对于强凸且光滑函数,我们所提出的算法在达到对数因子范围内,实现了作为查询次数函数的动态遗憾的极小极大最优速率。