The two-sided markets such as ride-sharing companies often involve a group of subjects who are making sequential decisions across time and/or location. With the rapid development of smart phones and internet of things, they have substantially transformed the transportation landscape of human beings. In this paper we consider large-scale fleet management in ride-sharing companies that involve multiple units in different areas receiving sequences of products (or treatments) over time. Major technical challenges, such as policy evaluation, arise in those studies because (i) spatial and temporal proximities induce interference between locations and times; and (ii) the large number of locations results in the curse of dimensionality. To address both challenges simultaneously, we introduce a multi-agent reinforcement learning (MARL) framework for carrying policy evaluation in these studies. We propose novel estimators for mean outcomes under different products that are consistent despite the high-dimensionality of state-action space. The proposed estimator works favorably in simulation experiments. We further illustrate our method using a real dataset obtained from a two-sided marketplace company to evaluate the effects of applying different subsidizing policies. A Python implementation of our proposed method is available at https://github.com/RunzheStat/CausalMARL.
翻译:双边市场(如网约车公司)通常涉及一组在时间和/或空间上连续决策的主体。随着智能手机和物联网的快速发展,这些市场已显著改变了人类的交通格局。本文研究了网约车公司中大规规模车队管理问题,其中涉及不同区域的多个单元在时间序列中接收连续的产品(或处理方式)。此类研究面临的主要技术挑战包括策略评估,其原因在于:(i) 空间和时间的邻近性导致不同位置和时间点之间存在干扰;(ii) 大量位置区域引发维度灾难。为同时应对这两大挑战,我们引入了一个多智能体强化学习(MARL)框架,用于在这些研究中实施策略评估。我们提出了基于不同产品下均值结果的新型估计量,该估计量在状态-动作空间高维的情况下仍能保持一致性。所提出的估计量在仿真实验中表现优异。我们进一步利用来自某双边市场平台的实际数据集,验证了该方法在评估不同补贴政策效果中的应用。本文所提方法的Python实现代码见https://github.com/RunzheStat/CausalMARL。