We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption. This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption. We provide the first algorithm for CBwLC (or CBwK) that is based on regression oracles. The algorithm is simple, computationally efficient, and admits vanishing regret. It is statistically optimal for the variant of CBwK in which the algorithm must stop once some constraint is violated. Further, we provide the first vanishing-regret guarantees for CBwLC (or CBwK) that extend beyond the stochastic environment. We side-step strong impossibility results from prior work by identifying a weaker (and, arguably, fairer) benchmark to compare against. Our algorithm builds on LagrangeBwK (Immorlica et al., FOCS 2019), a Lagrangian-based technique for CBwK, and SquareCB (Foster and Rakhlin, ICML 2020), a regression-based technique for contextual bandits. Our analysis leverages the inherent modularity of both techniques.
翻译:我们考虑具有线性约束的上下文赌博机(CBwLC),这是上下文赌博机的一种变体,其中算法在总消耗受线性约束的情况下消耗多种资源。该问题推广了具有背包约束的上下文赌博机(CBwK),允许打包和覆盖约束以及正负资源消耗。我们提出了首个基于回归预言的CBwLC(或CBwK)算法。该算法简洁、计算高效,且具有渐近消失的遗憾值。对于算法必须在违反某个约束时停止的CBwK变体,该算法在统计上是最优的。此外,我们首次为CBwLC(或CBwK)提供了适用于随机环境之外的渐近遗憾保证。通过识别一个更弱(且可以说更公平)的比较基准,我们避开了先前工作中的强不可能性结果。我们的算法基于LagrangeBwK(Immorlica等人,FOCS 2019)——一种用于CBwK的拉格朗日技术,以及SquareCB(Foster和Rakhlin,ICML 2020)——一种用于上下文赌博机的回归技术。我们的分析利用了这两种技术固有的模块化特性。