We study an online dynamic pricing problem where the potential demand at each time period $t=1,2,\ldots, T$ is stochastic and dependent on the price. However, a perishable inventory is imposed at the beginning of each time $t$, censoring the potential demand if it exceeds the inventory level. To address this problem, we introduce a pricing algorithm based on the optimistic estimates of derivatives. We show that our algorithm achieves $\tilde{O}(\sqrt{T})$ optimal regret even with adversarial inventory series. Our findings advance the state-of-the-art in online decision-making problems with censored feedback, offering a theoretically optimal solution against adversarial observations.
翻译:本文研究一个在线动态定价问题,其中每个时间周期 $t=1,2,\ldots, T$ 的潜在需求是随机的且依赖于价格。然而,每个时间 $t$ 开始时存在易腐库存约束,当潜在需求超过库存水平时会产生截断。为解决此问题,我们提出了一种基于导数乐观估计的定价算法。我们证明,即使在对抗性库存序列下,该算法仍能实现 $\tilde{O}(\sqrt{T})$ 的最优遗憾界。我们的研究推进了具有截断反馈的在线决策问题的最新进展,为对抗性观测提供了理论最优解。