We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption. This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption. We provide the first algorithm for CBwLC (or CBwK) that is based on regression oracles. The algorithm is simple, computationally efficient, and admits vanishing regret. It is statistically optimal for the variant of CBwK in which the algorithm must stop once some constraint is violated. Further, we provide the first vanishing-regret guarantees for CBwLC (or CBwK) that extend beyond the stochastic environment. We side-step strong impossibility results from prior work by identifying a weaker (and, arguably, fairer) benchmark to compare against. Our algorithm builds on LagrangeBwK (Immorlica et al., FOCS 2019), a Lagrangian-based technique for CBwK, and SquareCB (Foster and Rakhlin, ICML 2020), a regression-based technique for contextual bandits. Our analysis leverages the inherent modularity of both techniques.
翻译:我们考虑带有线性约束的上下文强盗(CBwLC),这是上下文强盗问题的一种变体,其中算法在总消耗受线性约束的情况下消耗多种资源。该问题推广了带有背包的上下文强盗(CBwK),允许打包和覆盖约束,以及正负资源消耗。我们提出了首个基于回归预言机的CBwLC(或CBwK)算法。该算法简单、计算高效,且具有消失遗憾。对于一旦违反某些约束就必须停止的CBwK变体,该算法在统计上是最优的。此外,我们首次提供了超出随机环境的CBwLC(或CBwK)的消失遗憾保证。我们通过识别一个更弱(且可以说更公平)的比较基准,规避了先前工作中的强不可能性结果。我们的算法基于LagrangeBwK(Immorlica等人,FOCS 2019),一种用于CBwK的拉格朗日技术,以及SquareCB(Foster和Rakhlin,ICML 2020),一种用于上下文强盗的回归技术。我们的分析利用了这两种技术固有的模块化特性。