We consider a non-stationary Bandits with Knapsack problem. The outcome distribution at each time is scaled by a non-stationary quantity that signifies changing demand volumes. Instead of studying settings with limited non-stationarity, we investigate how online predictions on the total demand volume $Q$ allows us to improve our performance guarantees. We show that, without any prediction, any online algorithm incurs a linear-in-$T$ regret. In contrast, with online predictions on $Q$, we propose an online algorithm that judiciously incorporates the predictions, and achieve regret bounds that depends on the accuracy of the predictions. These bounds are shown to be tight in settings when prediction accuracy improves across time. Our theoretical results are corroborated by our numerical findings.
翻译:我们考虑一个非平稳的背包赌博机问题。每个时间点的结果分布由一个表示需求变化量的非平稳因子进行缩放。我们并未研究具有有限非平稳性的设置,而是探讨关于总需求量$Q$的在线预测如何使我们能够改进性能保证。我们证明,在没有预测的情况下,任何在线算法都会招致线性于$T$的遗憾。相反,在具有关于$Q$的在线预测的情况下,我们提出了一种巧妙结合预测的在线算法,并实现了依赖于预测准确度的遗憾界。当预测准确度随时间改善时,这些界被证明是紧的。我们的理论结果得到了数值实验的证实。