We consider the problem of optimal unsignalized intersection management for continual streams of randomly arriving robots. This problem involves repeatedly solving different instances of a mixed integer program, for which the computation time using a naive optimization algorithm scales exponentially with the number of robots and lanes. Hence, such an approach is not suitable for real-time implementation. In this paper, we propose a solution framework that combines learning and sequential optimization. In particular, we propose an algorithm for learning a shared policy that given the traffic state information, determines the crossing order of the robots. Then, we optimize the trajectories of the robots sequentially according to that crossing order. This approach inherently guarantees safety at all times. We validate the performance of this approach using extensive simulations. Our approach, on average, significantly outperforms the heuristics from the literature. We also show through simulations that the computation time for our approach scales linearly with the number of robots. We further implement the learnt policies on physical robots with a few modifications to the solution framework to address real-world challenges and establish its real-time implementability.
翻译:本文考虑连续随机到达的机器人流在无信号交叉口的最优管理问题。该问题需反复求解不同实例的混合整数规划,而采用朴素优化算法的计算时间随机器人与车道数量呈指数增长,因此此类方法不适用于实时场景。本文提出一种融合学习与序贯优化的求解框架。具体而言,我们设计了一种共享策略学习算法,该算法根据交通状态信息确定机器人通行顺序,并依据该顺序对机器人轨迹进行序贯优化。此方法可固有地保证全程安全性。通过大量仿真验证,该方法平均性能显著优于文献中的启发式方法。仿真结果进一步表明,该方法计算时间随机器人数量呈线性增长。我们还将学习得到的策略部署至实体机器人,并对求解框架进行少量改进以应对实际挑战,验证了其实时可部署性。