We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption. This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption. We provide the first algorithm for CBwLC (or CBwK) that is based on regression oracles. The algorithm is simple, computationally efficient, and statistically optimal under mild assumptions. Further, we provide the first vanishing-regret guarantees for CBwLC (or CBwK) that extend beyond the stochastic environment. We side-step strong impossibility results from prior work by identifying a weaker (and, arguably, fairer) benchmark to compare against. Our algorithm builds on LagrangeBwK (Immorlica et al., FOCS 2019), a Lagrangian-based technique for CBwK, and SquareCB (Foster and Rakhlin, ICML 2020), a regression-based technique for contextual bandits. Our analysis leverages the inherent modularity of both techniques.
翻译:本文研究带线性约束的上下文赌博机(CBwLC),这是上下文赌博机的一种变体,其算法在满足总消耗量线性约束的条件下消耗多种资源。该问题推广了带背包约束的上下文赌博机(CBwK),允许包含装箱约束与覆盖约束,同时支持正负资源消耗。我们提出了首个基于回归预言机的CBwLC(及CBwK)算法。该算法结构简洁、计算高效,且在温和假设下具有统计最优性。此外,我们首次为CBwLC(及CBwK)建立了超越随机环境的渐近零遗憾保证。通过界定一个更弱(且可论证更公平)的比较基准,我们规避了先前研究中存在的强不可能性结果。本算法建立在LagrangeBwK(Immorlica等人,FOCS 2019)——一种基于拉格朗日方法的CBwK技术,以及SquareCB(Foster与Rakhlin,ICML 2020)——一种基于回归的上下文赌博机技术之上。我们的分析充分运用了这两种技术固有的模块化特性。