We propose a novel algorithm for online resource allocation with non-stationary customer arrivals and unknown click-through rates. We assume multiple types of customers arrive in a nonstationary stochastic fashion, with unknown arrival rates in each period, and that customers' click-through rates are unknown and can only be learned online. By leveraging results from the stochastic contextual bandit with knapsack and online matching with adversarial arrivals, we develop an online scheme to allocate the resources to nonstationary customers. We prove that under mild conditions, our scheme achieves a ``best-of-both-world'' result: the scheme has a sublinear regret when the customer arrivals are near-stationary, and enjoys an optimal competitive ratio under general (non-stationary) customer arrival distributions. Finally, we conduct extensive numerical experiments to show our approach generates near-optimal revenues for all different customer scenarios.
翻译:我们提出了一种针对非平稳客户到达和未知点击率的新型在线资源分配算法。我们假设多种类型的客户以非平稳随机方式到达,每时期到达率未知,且客户的点击率未知且只能在线学习。通过利用随机情境背包老虎机和对抗性到达在线匹配的研究成果,我们开发了一种在线方案,将资源分配给非平稳客户。我们证明,在温和条件下,该方案实现了“两全其美”的结果:当客户到达近似平稳时,方案具有次线性遗憾;而在一般(非平稳)客户到达分布下,方案享有最优竞争比。最后,我们进行了大量数值实验,表明我们的方法在所有不同客户场景下均能生成接近最优的收益。