We present a novel reinforcement learning based algorithm for multi-robot task allocation problem in warehouse environments. We formulate it as a Markov Decision Process and solve via a novel deep multi-agent reinforcement learning method (called RTAW) with attention inspired policy architecture. Hence, our proposed policy network uses global embeddings that are independent of the number of robots/tasks. We utilize proximal policy optimization algorithm for training and use a carefully designed reward to obtain a converged policy. The converged policy ensures cooperation among different robots to minimize total travel delay (TTD) which ultimately improves the makespan for a sufficiently large task-list. In our extensive experiments, we compare the performance of our RTAW algorithm to state of the art methods such as myopic pickup distance minimization (greedy) and regret based baselines on different navigation schemes. We show an improvement of upto 14% (25-1000 seconds) in TTD on scenarios with hundreds or thousands of tasks for different challenging warehouse layouts and task generation schemes. We also demonstrate the scalability of our approach by showing performance with up to $1000$ robots in simulations.
翻译:摘要:我们提出了一种新颖的基于强化学习的算法,用于解决仓库环境中的多机器人任务分配问题。我们将该问题形式化为马尔可夫决策过程,并通过一种新颖的基于注意力机制策略架构的深度多智能体强化学习方法(称为RTAW)进行求解。因此,我们提出的策略网络使用了与机器人/任务数量无关的全局嵌入。我们利用近端策略优化算法进行训练,并通过精心设计的奖励函数来获得收敛的策略。该收敛策略确保了不同机器人之间的协作,以最小化总行程延迟(TTD),从而最终改善足够大任务列表的完工时间。在广泛的实验中,我们将RTAW算法的性能与最先进的方法(如短视取货距离最小化(贪心)方法以及基于遗憾的基线方法)在不同导航方案上进行了比较。在包含数百或数千个任务的不同具有挑战性的仓库布局和任务生成方案中,我们展示了TTD提升了高达14%(25-1000秒)。此外,我们还通过在模拟中展示多达$1000$个机器人的性能,证明了我们方法的可扩展性。