In many domains such as transportation and logistics, search and rescue, or cooperative surveillance, tasks are pending to be allocated with the consideration of possible execution uncertainties. Existing task coordination algorithms either ignore the stochastic process or suffer from the computational intensity. Taking advantage of the weakly coupled feature of the problem and the opportunity for coordination in advance, we propose a decentralized auction-based coordination strategy using a newly formulated score function which is generated by forming the problem into task-constrained Markov decision processes (MDPs). The proposed method guarantees convergence and at least 50% optimality in the premise of a submodular reward function. Furthermore, for the implementation on large-scale applications, an approximate variant of the proposed method, namely Deep Auction, is also suggested with the use of neural networks, which is evasive of the troublesome for constructing MDPs. Inspired by the well-known actor-critic architecture, two Transformers are used to map observations to action probabilities and cumulative rewards respectively. Finally, we demonstrate the performance of the two proposed approaches in the context of drone deliveries, where the stochastic planning for the drone league is cast into a stochastic price-collecting Vehicle Routing Problem (VRP) with time windows. Simulation results are compared with state-of-the-art methods in terms of solution quality, planning efficiency and scalability.
翻译:在交通物流、搜索救援或协同监视等众多领域中,任务需要在考虑可能执行不确定性的情况下进行分配。现有任务协调算法要么忽略随机过程,要么受制于计算强度。利用问题的弱耦合特性及提前协调的机会,我们提出了一种基于拍卖的去中心化协调策略,该策略使用新构建的评分函数,该函数通过将问题转化为带任务约束的马尔可夫决策过程(MDPs)生成。所提方法在子模奖励函数前提下保证收敛且至少达到50%最优性。此外,针对大规模应用场景,我们还提出了一种近似变体——深度拍卖(Deep Auction),其利用神经网络避免了构建MDP的繁琐过程。受著名的演员-评论家架构启发,我们使用两个Transformer将观测分别映射到动作概率和累积奖励。最后,我们在无人机配送场景中展示了两种方法的性能,其中无人机编队的随机规划被转化为带时间窗的随机收集型车辆路径问题(VRP)。仿真结果在求解质量、规划效率和可扩展性方面与最先进方法进行了比较。