Randomized experiments, or A/B testing, are the gold standard for evaluating interventions, yet they remain underutilized in inventory management. This study addresses this gap by analyzing A/B testing strategies in multi-item, multi-period inventory systems with lost sales and capacity constraints. We examine two canonical experimental designs, namely, switchback experiments and item-level randomization, and show that both suffer from systematic bias due to interference: temporal carryover in switchbacks and cannibalization across items under capacity constraints. Under mild conditions, we characterize the direction of this bias, proving that switchback designs systematically underestimate, while item-level randomization systematically overestimate, the global treatment effect. Motivated by two-sided randomization, we propose a pairwise design over items and time and analyze its bias properties. Numerical experiments using real-world data validate our theory and provide concrete guidance for selecting experimental designs in practice.
翻译:随机实验(或称A/B测试)是评估干预措施的金标准,但在库存管理领域仍未得到充分利用。本研究通过分析具有缺货损失和容量约束的多物品、多周期库存系统中的A/B测试策略,填补了这一空白。我们考察了两种经典实验设计——即回切实验和物品级随机化,并证明两者均因干扰效应存在系统性偏差:回切实验中的时间残留效应以及容量约束下物品间的蚕食效应。在温和条件下,我们刻画了这种偏差的方向,证明回切设计会系统性地低估全局处理效应,而物品级随机化则会系统性地高估全局处理效应。受双向随机化的启发,我们提出了一种跨物品与时间的配对设计,并分析了其偏差特性。基于实际数据的数值实验验证了我们的理论,并为实践中选择实验设计提供了具体指导。