Online Linear Programming with Replenishment

We study an online linear programming (OLP) model in which inventory is not provided upfront but instead arrives gradually through an exogenous stochastic replenishment process. This replenishment-based formulation captures operational settings, such as e-commerce fulfillment, perishable supply chains, and renewable-powered systems, where resources are accumulated gradually and initial inventories are small or zero. The introduction of dispersed, uncertain replenishment fundamentally alters the structure of classical OLPs, creating persistent stockout risk and eliminating advance knowledge of the total budget. We develop new algorithms and regret analyses for three major distributional regimes studied in the OLP literature: bounded distributions, finite-support distributions, and continuous-support distributions with a non-degeneracy condition. For bounded distributions, we design an algorithm that achieves $\widetilde{\mathcal{O}}(\sqrt{T})$ regret. For finite-support distributions with a non-degenerate induced LP, we obtain $\mathcal{O}(\log T)$ regret, and we establish an $Ω(\sqrt{T})$ lower bound for degenerate instances, demonstrating a sharp separation from the classical setting where $\mathcal{O}(1)$ regret is achievable. For continuous-support, non-degenerate distributions, we develop a two-stage accumulate-then-convert algorithm that achieves $\mathcal{O}(\log^2 T)$ regret, comparable to the $\mathcal{O}(\log T)$ regret in classical OLPs. Together, these results provide a near-complete characterization of the optimal regret achievable in OLP with replenishment. Finally, we empirically evaluate our algorithms and demonstrate their advantages over natural adaptations of classical OLP methods in the replenishment setting.

翻译：本文研究一种在线线性规划（OLP）模型，其中库存并非预先给定，而是通过外生的随机补给过程逐渐到达。这种基于补给的模型刻画了诸如电子商务履约、易腐品供应链以及可再生能源供电系统等运营场景，在这些场景中资源是逐步累积的，且初始库存量很小或为零。分散且不确定的补给引入从根本上改变了经典OLP问题的结构，带来了持续的缺货风险，并消除了对总预算的先验知识。针对OLP文献中研究的三大分布类型——有界分布、有限支撑分布以及满足非退化条件的连续支撑分布，我们提出了新的算法并进行了遗憾分析。对于有界分布，我们设计了一种算法，其遗憾界为 $\widetilde{\mathcal{O}}(\sqrt{T})$。对于具有非退化诱导线性规划的有限支撑分布，我们获得了 $\mathcal{O}(\log T)$ 的遗憾界，并且针对退化情形建立了 $Ω(\sqrt{T})$ 的下界，这展示了与经典设定（其中可实现 $\mathcal{O}(1)$ 遗憾）的显著分离。对于连续支撑的非退化分布，我们开发了一种两阶段的“先累积后转换”算法，其遗憾界为 $\mathcal{O}(\log^2 T)$，与经典OLP中的 $\mathcal{O}(\log T)$ 遗憾界相当。这些结果共同提供了带补给的OLP问题中可达到的最优遗憾的近乎完整的刻画。最后，我们通过实验评估了所提出的算法，并证明了其在补给设定中相较于经典OLP方法自然改编版本的优势。