Human-in-the-loop Learning for Dynamic Congestion Games

Today mobile users learn and share their traffic observations via crowdsourcing platforms (e.g., Waze). Yet such platforms simply cater to selfish users' myopic interests to recommend the shortest path, and do not encourage enough users to travel and learn other paths for future others. Prior studies focus on one-shot congestion games without considering users' information learning, while our work studies how users learn and alter traffic conditions on stochastic paths in a human-in-the-loop manner. Our analysis shows that the myopic routing policy leads to severe under-exploration of stochastic paths. This results in a price of anarchy (PoA) greater than $2$, as compared to the socially optimal policy in minimizing the long-term social cost. Besides, the myopic policy fails to ensure the correct learning convergence about users' traffic hazard beliefs. To address this, we focus on informational (non-monetary) mechanisms as they are easier to implement than pricing. We first show that existing information-hiding mechanisms and deterministic path-recommendation mechanisms in Bayesian persuasion literature do not work with even (\text{PoA}=\infty). Accordingly, we propose a new combined hiding and probabilistic recommendation (CHAR) mechanism to hide all information from a selected user group and provide state-dependent probabilistic recommendations to the other user group. Our CHAR successfully ensures PoA less than (\frac{5}{4}), which cannot be further reduced by any other informational (non-monetary) mechanism. Besides the parallel network, we further extend our analysis and CHAR to more general linear path graphs with multiple intermediate nodes, and we prove that the PoA results remain unchanged. Additionally, we carry out experiments with real-world datasets to further extend our routing graphs and verify the close-to-optimal performance of our CHAR.

翻译：当前，移动用户通过众包平台（如Waze）学习并分享其交通观测信息。然而，此类平台往往仅迎合自私用户的短视利益以推荐最短路径，未能激励足够用户探索其他路径以供未来他人使用。先前研究主要关注不考虑用户信息学习的单次拥堵博弈，而本文则研究用户如何以人机协同的方式学习并改变随机路径上的交通状况。分析表明，短视路径选择策略会导致对随机路径的严重探索不足。相较于最小化长期社会成本的社会最优策略，这导致了大于$2$的无政府状态代价（PoA）。此外，短视策略无法确保用户对交通风险信念的正确学习收敛。为解决此问题，我们聚焦于信息型（非货币性）机制，因其比定价机制更易实施。我们首先证明，贝叶斯劝说文献中现有的信息隐藏机制与确定性路径推荐机制在即使（\text{PoA}=\infty）时仍无效。据此，我们提出一种新的组合隐藏与概率推荐（CHAR）机制，对选定用户群隐藏所有信息，并对另一用户群提供状态依赖的概率推荐。我们的CHAR机制成功确保PoA小于（\frac{5}{4}），且任何其他信息型（非货币性）机制均无法进一步降低该值。除平行网络外，我们进一步将分析及CHAR机制扩展至具有多个中间节点的更一般线性路径图，并证明PoA结果保持不变。此外，我们利用真实数据集进行实验，进一步扩展路由图并验证CHAR机制接近最优的性能。