Advances in computational power and AI have increased interest in reinforcement learning approaches to inventory management. This paper provides a theoretical foundation for these approaches and investigates the benefits of restricting to policy structures that are well-established by inventory theory. In particular, we prove generalization guarantees for learning several well-known classes of inventory policies, including base-stock and (s, S) policies, by leveraging the celebrated Vapnik-Chervonenkis (VC) theory. We apply the Pseudo-dimension and Fat-shattering dimension from VC theory to determine the generalization error of inventory policies, that is, the difference between an inventory policy's performance on training data and its expected performance on unseen data. We focus on a classical setting without contexts, but allow for an arbitrary distribution over demand sequences and do not make any assumptions such as independence over time. We corroborate our supervised learning results using numerical simulations. Managerially, our theory and simulations translate to the following insights. First, there is a principle of ``learning less is more'' in inventory management: depending on the amount of data available, it may be beneficial to restrict oneself to a simpler, albeit suboptimal, class of inventory policies to minimize overfitting errors. Second, the number of parameters in a policy class may not be the correct measure of overfitting error: in fact, the class of policies defined by T time-varying base-stock levels exhibits a generalization error an order of magnitude lower than that of the two-parameter (s, S) policy class. Finally, our research suggests situations in which it could be beneficial to incorporate the concepts of base-stock and inventory position into black-box learning machines, instead of having these machines directly learn the order quantity actions.
翻译:计算能力与人工智能的进步增强了人们对库存管理中强化学习方法的兴趣。本文为这些方法提供了理论基础,并研究了限制于库存理论中成熟策略结构所带来的优势。具体而言,我们通过利用著名的Vapnik-Chervonenkis(VC)理论,证明了学习若干著名库存策略类(包括基准库存策略和(s, S)策略)的泛化保证。我们应用VC理论中的伪维数与Fat-shattering维数来确定库存策略的泛化误差,即库存策略在训练数据上的表现与其在未见数据上期望表现之间的差异。我们聚焦于无背景信息的经典设定,但允许需求序列服从任意分布,且不作任何时间独立性等假设。我们通过数值模拟验证了监督学习的结果。在管理意义上,我们的理论与模拟可转化为以下洞见:首先,库存管理中存在“少学即多得”原则——根据可用数据量,限制自身使用更简单(尽管次优)的库存策略类可能有利于最小化过拟合误差。其次,策略类中的参数数量可能并非衡量过拟合误差的正确指标:事实上,由T个时变基准库存水平定义的策略类,其泛化误差比双参数(s, S)策略类低一个数量级。最后,我们的研究表明在某些情境下,将基准库存与库存位置的概念融入黑箱学习机器,而非让这些机器直接学习订购量行动,可能更具优势。