We envision a warehouse in which dozens of mobile robots and human pickers work together to collect and deliver items within the warehouse. The fundamental problem we tackle, called the order-picking problem, is how these worker agents must coordinate their movement and actions in the warehouse to maximise performance (e.g. order throughput). Established industry methods using heuristic approaches require large engineering efforts to optimise for innately variable warehouse configurations. In contrast, multi-agent reinforcement learning (MARL) can be flexibly applied to diverse warehouse configurations (e.g. size, layout, number/types of workers, item replenishment frequency), as the agents learn through experience how to optimally cooperate with one another. We develop hierarchical MARL algorithms in which a manager assigns goals to worker agents, and the policies of the manager and workers are co-trained toward maximising a global objective (e.g. pick rate). Our hierarchical algorithms achieve significant gains in sample efficiency and overall pick rates over baseline MARL algorithms in diverse warehouse configurations, and substantially outperform two established industry heuristics for order-picking systems.
翻译:我们设想一个仓库场景,其中数十台移动机器人与人类拣选员协同工作,共同完成仓库内物品的收集与配送。我们解决的基本问题称为订单拣选问题,即这些工人智能体如何协调其在仓库中的移动与行动以最大化性能(例如订单吞吐量)。采用启发式方法的成熟工业方案需要大量工程优化工作才能适应仓库配置的固有可变性。相比之下,多智能体强化学习(MARL)能够灵活应用于不同仓库配置(如规模、布局、工人数量/类型、物品补货频率),因为智能体可通过经验学习如何最优地相互协作。我们开发了分层MARL算法,其中管理者向工人智能体分配目标,管理者和工人的策略通过协同训练以最大化全局目标(如拣选速率)。我们的分层算法在多样化的仓库配置中相比基线MARL算法在样本效率和整体拣选速率方面取得了显著提升,并大幅优于两种用于订单拣选系统的成熟工业启发式方案。