The COVID-19 pandemic has highlighted the importance of supply chains and the role of digital management to react to dynamic changes in the environment. In this work, we focus on developing dynamic inventory ordering policies for a multi-echelon, i.e. multi-stage, supply chain. Traditional inventory optimization methods aim to determine a static reordering policy. Thus, these policies are not able to adjust to dynamic changes such as those observed during the COVID-19 crisis. On the other hand, conventional strategies offer the advantage of being interpretable, which is a crucial feature for supply chain managers in order to communicate decisions to their stakeholders. To address this limitation, we propose an interpretable reinforcement learning approach that aims to be as interpretable as the traditional static policies while being as flexible and environment-agnostic as other deep learning-based reinforcement learning solutions. We propose to use Neural Additive Models as an interpretable dynamic policy of a reinforcement learning agent, showing that this approach is competitive with a standard full connected policy. Finally, we use the interpretability property to gain insights into a complex ordering strategy for a simple, linear three-echelon inventory supply chain.
翻译:新冠疫情凸显了供应链的重要性以及数字管理在应对环境动态变化中的作用。本文聚焦于为多级(即多阶段)供应链开发动态库存订购策略。传统库存优化方法通常旨在确定静态补货策略,因此无法适应如疫情期间出现的动态变化。另一方面,传统策略具有可解释性优势,这对供应链管理者向利益相关者沟通决策至关重要。为解决这一局限,我们提出了一种可解释的强化学习方法,旨在兼具传统静态策略的可解释性,以及基于深度学习的强化学习解决方案的灵活性与环境无关性。我们采用神经加性模型作为强化学习智能体的可解释动态策略,并证明该方法与标准全连接策略相比具有竞争力。最后,我们利用可解释性特性,深入分析了简单线性三阶供应链中的复杂订购策略。