We consider a general online resource allocation model with bandit feedback and time-varying demands. While online resource allocation has been well studied in the literature, most existing works make the strong assumption that the demand arrival process is stationary. In practical applications, such as online advertisement and revenue management, however, this process may be exogenous and non-stationary, like the constantly changing internet traffic. Motivated by the recent Online Algorithms with Advice framework [Mitazenmacher and Vassilvitskii, \emph{Commun. ACM} 2022], we explore how online advice can inform policy design. We establish an impossibility result that any algorithm perform poorly in terms of regret without any advice in our setting. In contrast, we design an robust online algorithm that leverages the online predictions on the total demand volumes. Empowered with online advice, our proposed algorithm is shown to have both theoretical performance and promising numerical results compared with other algorithms in literature. We also provide two explicit examples for the time-varying demand scenarios and derive corresponding theoretical performance guarantees. Finally, we adapt our model to a network revenue management problem, and numerically demonstrate that our algorithm can still performs competitively compared to existing baselines.
翻译:本文考虑一个具有Bandit反馈和时间变化需求的通用在线资源分配模型。尽管在线资源分配在文献中已有广泛研究,但现有工作大多假设需求到达过程是平稳的。然而,在实际应用如在线广告和收益管理中,这一过程可能是外生且非平稳的,例如不断变化的互联网流量。受近期在线算法与建议框架[Mitzenmacher和Vassilvitskii, \emph{Commun. ACM} 2022]的启发,我们探讨了在线建议如何指导策略设计。我们建立了一个不可能性结果:在缺乏任何建议的情况下,任何算法在遗憾值方面表现不佳。相反,我们设计了一种鲁棒的在线算法,该算法利用关于总需求量的在线预测。借助在线建议的支持,我们提出的算法在理论性能上与文献中的其他算法相比表现出色,并获得了令人满意的数值结果。我们还针对时间变化需求场景提供了两个明确示例,并推导了相应的理论性能保证。最后,我们将模型应用于网络收益管理问题,并通过数值实验证明,与现有基线相比,我们的算法仍具有竞争力。