We consider the problem of optimal unsignalized intersection management for continual streams of randomly arriving robots. This problem involves solving many instances of a mixed integer program, for which the computation time using a naive optimization algorithm scales exponentially with the number of robots and lanes. Hence, such an approach is not suitable for real-time implementation. In this paper, we propose a solution framework that combines learning and sequential optimization. In particular, we propose an algorithm for learning a policy that given the traffic state information, determines the crossing order of the robots. Then, we optimize the trajectories of the robots sequentially according to that crossing order. The proposed algorithm learns a shared policy that can be deployed in a distributed manner. We validate the performance of this approach using extensive simulations. Our approach, on average, significantly outperforms the heuristics from the literature and gives near-optimal solutions. We also show through simulations that the computation time for our approach scales linearly with the number of robots.
翻译:我们考虑连续随机到达机器人流的最优无信号路口管理问题。该问题涉及求解大量混合整数规划实例,而使用朴素优化算法时,计算时间随机器人数量和车道数量呈指数增长。因此,这种方法不适合实时实现。本文提出了一种结合学习与顺序优化的求解框架。具体而言,我们提出了一种学习策略的算法,该策略根据交通状态信息确定机器人的交叉顺序。随后,我们根据该交叉顺序依次优化机器人的轨迹。该算法学习可分布式部署的共享策略。通过大量仿真验证了该方法的性能。平均而言,我们的方法显著优于文献中的启发式方法,并能提供接近最优的解。仿真结果还表明,我们的方法计算时间随机器人数量线性增长。