The COVID-19 pandemic has highlighted the importance of supply chains and the role of digital management to react to dynamic changes in the environment. In this work, we focus on developing dynamic inventory ordering policies for a multi-echelon, i.e. multi-stage, supply chain. Traditional inventory optimization methods aim to determine a static reordering policy. Thus, these policies are not able to adjust to dynamic changes such as those observed during the COVID-19 crisis. On the other hand, conventional strategies offer the advantage of being interpretable, which is a crucial feature for supply chain managers in order to communicate decisions to their stakeholders. To address this limitation, we propose an interpretable reinforcement learning approach that aims to be as interpretable as the traditional static policies while being as flexible and environment-agnostic as other deep learning-based reinforcement learning solutions. We propose to use Neural Additive Models as an interpretable dynamic policy of a reinforcement learning agent, showing that this approach is competitive with a standard full connected policy. Finally, we use the interpretability property to gain insights into a complex ordering strategy for a simple, linear three-echelon inventory supply chain.
翻译:新冠疫情凸显了供应链的重要性以及数字管理在应对环境动态变化中的作用。本研究聚焦于为多层级(即多阶段)供应链开发动态库存订货策略。传统的库存优化方法旨在确定静态再订货策略,因此无法适应如新冠疫情期间出现的动态变化。然而,传统策略具有可解释性的优势——这对供应链管理者向利益相关方传达决策至关重要。针对这一局限性,我们提出了一种可解释的强化学习方法,旨在兼具传统静态策略的可解释性,以及基于深度学习的强化学习方案的灵活性与环境无关性。具体而言,我们采用神经加性模型作为强化学习智能体的可解释动态策略,实验证明该方法与标准全连接策略性能相当。最后,我们利用可解释性特性,深入剖析了一个简单线性三层级供应链的复杂订货策略。