We consider bandits with anytime knapsacks (BwAK), a novel version of the BwK problem where there is an \textit{anytime} cost constraint instead of a total cost budget. This problem setting introduces additional complexities as it mandates adherence to the constraint throughout the decision-making process. We propose SUAK, an algorithm that utilizes upper confidence bounds to identify the optimal mixture of arms while maintaining a balance between exploration and exploitation. SUAK is an adaptive algorithm that strategically utilizes the available budget in each round in the decision-making process and skips a round when it is possible to violate the anytime cost constraint. In particular, SUAK slightly under-utilizes the available cost budget to reduce the need for skipping rounds. We show that SUAK attains the same problem-dependent regret upper bound of $ O(K \log T)$ established in prior work under the simpler BwK framework. Finally, we provide simulations to verify the utility of SUAK in practical settings.
翻译:我们研究具有任意时间背包的赌博机问题,这是BwK问题的一个新变体,其中采用\textit{任意时间}成本约束替代了总成本预算。该问题设定引入了额外的复杂性,因为它要求在决策过程中始终满足约束条件。我们提出SUAK算法,该算法利用上置信界识别最优臂组合,同时在探索与利用之间保持平衡。SUAK是一种自适应算法,在决策过程中策略性地利用每轮可用预算,并在可能违反任意时间成本约束时跳过该轮。特别地,SUAK会略微低效利用可用成本预算以减少跳过轮次的需求。我们证明SUAK在更简单的BwK框架下达到了与先前研究相同的问题相关遗憾上界$ O(K \log T)$。最后,我们通过仿真验证了SUAK在实际场景中的有效性。