Mobility systems often suffer from a high price of anarchy due to the uncontrolled behavior of selfish users. This may result in societal costs that are significantly higher compared to what could be achieved by a centralized system-optimal controller. Monetary tolling schemes can effectively align the behavior of selfish users with the system-optimum. Yet, they inevitably discriminate the population in terms of income. Artificial currencies were recently presented as an effective alternative that can achieve the same performance, whilst guaranteeing fairness among the population. However, those studies were based on behavioral models that may differ from practical implementations. This paper presents a data-driven approach to automatically adapt artificial-currency tolls within repetitive-game settings. We first consider a parallel-arc setting whereby users commute on a daily basis from an individual origin to an individual destination, choosing a route in exchange of an artificial-currency price or reward, while accounting for the impact of the choices of the other users on travel discomfort. Second, we devise a model-based reinforcement learning controller that autonomously learns the optimal pricing policy by interacting with the proposed framework considering the closeness of the observed aggregate flows to a desired system-optimal distribution as a reward function. Our numerical results show that the proposed data-driven pricing scheme can effectively align the users' flows with the system optimum, significantly reducing the societal costs with respect to the uncontrolled flows (by about 15% and 25% depending on the scenario), and respond to environmental changes in a robust and efficient manner.
翻译:交通系统常因自私用户的无控行为而遭受高昂的无序代价,这可能带来比集中式系统最优控制器所能实现的显著更高的社会成本。货币化收费方案能有效引导自私用户行为与系统最优保持一致,但不可避免地会因收入差异产生群体歧视。人工货币作为有效替代方案近期被提出,可在保证群体公平性的同时实现相同性能。然而,这些研究基于的行为模型可能与实际应用存在差异。本文提出一种数据驱动方法,在重复博弈场景中实现人工货币通行费的自动调节。我们首先考虑平行弧场景:用户每日从独立起点通勤至独立终点,通过选择路线换取人工货币价格或奖励,同时考虑其他用户选择对出行不适度的影响。其次,我们设计了一种基于模型的强化学习控制器,通过观察聚合车流与期望系统最优分布的接近程度作为奖励函数,自主学习最优定价策略。数值结果表明,所提出的数据驱动定价方案能有效引导用户流量趋于系统最优,相比无控流量可降低约15%-25%(视场景而定)的社会成本,并能以稳健高效的方式响应环境变化。