In this study, we analyze and compare the performance of state-of-the-art deep reinforcement learning algorithms for solving the supply chain inventory management problem. This complex sequential decision-making problem consists of determining the optimal quantity of products to be produced and shipped across different warehouses over a given time horizon. In particular, we present a mathematical formulation of a two-echelon supply chain environment with stochastic and seasonal demand, which allows managing an arbitrary number of warehouses and product types. Through a rich set of numerical experiments, we compare the performance of different deep reinforcement learning algorithms under various supply chain structures, topologies, demands, capacities, and costs. The results of the experimental plan indicate that deep reinforcement learning algorithms outperform traditional inventory management strategies, such as the static (s, Q)-policy. Furthermore, this study provides detailed insight into the design and development of an open-source software library that provides a customizable environment for solving the supply chain inventory management problem using a wide range of data-driven approaches.
翻译:在本研究中,我们分析并比较了最前沿的深度强化学习算法在求解供应链库存管理问题上的性能。这一复杂的序列决策问题涉及在给定时间范围内,确定在不同仓库之间生产和运输产品的最优数量。具体而言,我们提出了一个具有随机性和季节性需求的两级供应链环境的数学模型,该模型能够管理任意数量的仓库和产品类型。通过丰富的数值实验,我们比较了不同深度强化学习算法在各种供应链结构、拓扑结构、需求、容量和成本条件下的性能。实验计划的结果表明,深度强化学习算法优于传统库存管理策略,例如静态(s,Q)-策略。此外,本研究还详细介绍了设计并开发的一个开源软件库,该库提供了一个可定制的环境,可通过广泛的数据驱动方法求解供应链库存管理问题。