We study an online dynamic pricing problem where the potential demand at each time period $t=1,2,\ldots, T$ is stochastic and dependent on the price. However, a perishable inventory is imposed at the beginning of each time $t$, censoring the potential demand if it exceeds the inventory level. To address this problem, we introduce a pricing algorithm based on the optimistic estimates of derivatives. We show that our algorithm achieves $\tilde{O}(\sqrt{T})$ optimal regret even with adversarial inventory series. Our findings advance the state-of-the-art in online decision-making problems with censored feedback, offering a theoretically optimal solution against adversarial observations.
翻译:我们研究一个在线动态定价问题,其中每个时间段 $t=1,2,\ldots, T$ 的潜在需求是随机的且依赖于价格。然而,在每个时间 $t$ 开始时存在一个易腐库存,如果潜在需求超过库存水平,则会被截断。为解决此问题,我们引入了一种基于导数乐观估计的定价算法。我们证明,即使在对抗性库存序列下,我们的算法也能实现 $\tilde{O}(\sqrt{T})$ 的最优遗憾度。我们的研究结果推进了带有截断反馈的在线决策问题的最新进展,为对抗性观测提供了一个理论上的最优解。