We study non-stationary single-item, periodic-review inventory control problems in which the demand distribution is unknown and may change over time. We analyze how demand non-stationarity affects learning performance across inventory models, including systems with demand backlogging or lost-sales, both with and without lead times. For each setting, we propose an adaptive online algorithm that optimizes over the class of base-stock policies and establish performance guarantees in terms of dynamic regret relative to the optimal base-stock policy at each time step. Our results reveal a sharp separation across inventory models. In backlogging systems and lost-sales models with zero lead time, we show that it is possible to adapt to demand changes without incurring additional performance loss in stationary environments, even without prior knowledge of the demand distributions or the number of demand shifts. In contrast, for lost-sales systems with positive lead times, we establish weaker guarantees that reflect fundamental limitations imposed by delayed replenishment in combination with censored feedback. Our algorithms leverage the convexity and one-sided feedback structure of inventory costs to enable counterfactual policy evaluation despite demand censoring. We complement the theoretical analysis with simulation results showing that our methods significantly outperform existing benchmarks.
翻译:本文研究非平稳单物品周期盘点库存控制问题,其中需求分布未知且可能随时间变化。我们分析了需求非平稳性如何影响不同库存模型的学习性能,包括允许需求延期交货或发生缺货损失的库存系统,且同时考虑存在或不存在提前期的情况。针对每种场景,我们提出一种自适应在线算法,该算法在基库存策略类别上进行优化,并建立了相对于每个时间步最优基库存策略的动态遗憾性能保证。我们的研究结果揭示了不同库存模型间的显著差异。在允许延期交货的库存系统及零提前期的缺货损失模型中,我们证明即使在没有需求分布先验知识或需求变动次数信息的情况下,系统仍能适应需求变化,且不会在平稳环境中产生额外性能损失。相比之下,对于存在正提前期的缺货损失系统,我们建立了较弱的性能保证,这反映了延迟补货与删失反馈共同作用所施加的根本性限制。我们的算法利用库存成本的凸性与单边反馈结构,在需求删失条件下实现了反事实策略评估。我们通过仿真实验补充理论分析,结果表明所提方法显著优于现有基准算法。