Most solutions to the inventory management problem assume a centralization of information that is incompatible with organisational constraints in real supply chain networks. The inventory management problem is a well-known planning problem in operations research, concerned with finding the optimal re-order policy for nodes in a supply chain. While many centralized solutions to the problem exist, they are not applicable to real-world supply chains made up of independent entities. The problem can however be naturally decomposed into sub-problems, each associated with an independent entity, turning it into a multi-agent system. Therefore, a decentralized data-driven solution to inventory management problems using multi-agent reinforcement learning is proposed where each entity is controlled by an agent. Three multi-agent variations of the proximal policy optimization algorithm are investigated through simulations of different supply chain networks and levels of uncertainty. The centralized training decentralized execution framework is deployed, which relies on offline centralization during simulation-based policy identification, but enables decentralization when the policies are deployed online to the real system. Results show that using multi-agent proximal policy optimization with a centralized critic leads to performance very close to that of a centralized data-driven solution and outperforms a distributed model-based solution in most cases while respecting the information constraints of the system.
翻译:大多数库存管理问题的解决方案假设信息中心化,这与真实供应链网络中的组织约束不兼容。库存管理问题是运筹学中一个著名的规划问题,其核心在于为供应链中的节点寻找最优补货策略。尽管存在多种中心化解决方案,但它们无法适用于由独立实体构成的现实供应链。然而,该问题可自然地分解为若干子问题,每个子问题对应一个独立实体,从而转化为多智能体系统。因此,本文提出了一种基于多智能体强化学习的去中心化数据驱动库存管理解决方案,其中每个实体由一个智能体控制。通过在不同供应链网络结构和不确定性水平下进行仿真,研究了近端策略优化算法的三种多智能体变体。采用中心化训练-去中心化执行框架,该框架在基于仿真的策略识别阶段依赖离线中心化,但在策略部署到真实系统的在线阶段实现去中心化。结果表明,使用具有中心化评论员的多智能体近端策略优化算法,其性能非常接近中心化数据驱动解决方案,在大多数情况下优于分布式基于模型的解决方案,同时遵守系统的信息约束。