The bandits with knapsack (BwK) framework models online decision-making problems in which an agent makes a sequence of decisions subject to resource consumption constraints. The traditional model assumes that each action consumes a non-negative amount of resources and the process ends when the initial budgets are fully depleted. We study a natural generalization of the BwK framework which allows non-monotonic resource utilization, i.e., resources can be replenished by a positive amount. We propose a best-of-both-worlds primal-dual template that can handle any online learning problem with replenishment for which a suitable primal regret minimizer exists. In particular, we provide the first positive results for the case of adversarial inputs by showing that our framework guarantees a constant competitive ratio $\alpha$ when $B=\Omega(T)$ or when the possible per-round replenishment is a positive constant. Moreover, under a stochastic input model, our algorithm yields an instance-independent $\tilde{O}(T^{1/2})$ regret bound which complements existing instance-dependent bounds for the same setting. Finally, we provide applications of our framework to some economic problems of practical relevance.
翻译:背包赌徒(BwK)框架建模了在线决策问题,其中智能体需在资源消耗约束下做出序列决策。传统模型假设每个动作消耗非负资源量,且当初始预算完全耗尽时过程终止。我们研究了BwK框架的一个自然推广,允许非单调资源利用,即资源可被正向补充。我们提出了一种两全其美的原始-对偶模板,能够处理任何存在合适原始遗憾最小化器的可补充在线学习问题。特别地,我们首次为对抗性输入场景给出了积极结果,证明了当$B=\Omega(T)$或每轮可能补充量为正常数时,我们的框架保证了常数竞争比$\alpha$。此外,在随机输入模型下,我们的算法实现了与实例无关的$\tilde{O}(T^{1/2})$遗憾界,这与同一场景下现有实例依赖界互为补充。最后,我们展示了该框架在若干实际相关经济问题中的应用。