Value iteration can find the optimal replenishment policy for a perishable inventory problem, but is computationally demanding due to the large state spaces that are required to represent the age profile of stock. The parallel processing capabilities of modern GPUs can reduce the wall time required to run value iteration by updating many states simultaneously. The adoption of GPU-accelerated approaches has been limited in operational research relative to other fields like machine learning, in which new software frameworks have made GPU programming widely accessible. We used the Python library JAX to implement value iteration and simulators of the underlying Markov decision processes in a high-level API, and relied on this library's function transformations and compiler to efficiently utilize GPU hardware. Our method can extend use of value iteration to settings that were previously considered infeasible or impractical. We demonstrate this on example scenarios from three recent studies which include problems with over 16 million states and additional problem features, such as substitution between products, that increase computational complexity. We compare the performance of the optimal replenishment policies to heuristic policies, fitted using simulation optimization in JAX which allowed the parallel evaluation of multiple candidate policy parameters on thousands of simulated years. The heuristic policies gave a maximum optimality gap of 2.49%. Our general approach may be applicable to a wide range of problems in operational research that would benefit from large-scale parallel computation on consumer-grade GPU hardware.
翻译:价值迭代能够求解生鲜品库存问题的最优补货策略,但由于需要表示库存年龄分布的大规模状态空间,其计算成本极高。现代GPU的并行处理能力可通过同时更新多个状态来减少价值迭代所需的实际运行时间。相较于机器学习领域(其新型软件框架已实现GPU编程的广泛可及性),GPU加速方法在运筹学中的采用仍十分有限。我们利用Python库JAX,通过高级API实现价值迭代及底层马尔可夫决策过程的仿真器,并借助该库的函数变换与编译器高效利用GPU硬件。该方法将价值迭代的应用扩展至先前被认为不可行或不切实际的场景。我们在三项近期研究的典型案例中验证了该方法,这些案例包含超过1600万个状态,并涉及产品间替代等增加计算复杂性的附加问题特征。我们将最优补货策略的性能与启发式策略进行比较——后者通过JAX中的仿真优化拟合,可并行评估数千仿真年中的多个候选策略参数。启发式策略的最优性缺口最大为2.49%。本通用方法可广泛应用于运筹学中需在消费级GPU硬件上进行大规模并行计算的各类问题。