We consider the sequential decision-making problem of making proactive request assignment and rejection decisions for a profit-maximizing operator of an autonomous mobility on demand system. We formalize this problem as a Markov decision process and propose a novel combination of multi-agent Soft Actor-Critic and weighted bipartite matching to obtain an anticipative control policy. Thereby, we factorize the operator's otherwise intractable action space, but still obtain a globally coordinated decision. Experiments based on real-world taxi data show that our method outperforms state of the art benchmarks with respect to performance, stability, and computational tractability.
翻译:本文研究利润最大化的自主按需出行系统运营者在进行主动请求分配与拒绝决策时面临的序贯决策问题。我们将该问题形式化为马尔可夫决策过程,并提出了一种融合多智能体柔性演员-评论家算法与加权二分图匹配的创新方法,以获取前瞻性控制策略。通过这种方法,我们分解了运营者原本棘手的动作空间,同时仍能获得全局协调的决策。基于真实出租车数据的实验表明,我们的方法在性能、稳定性和计算可行性方面均优于现有最先进的基准方法。