Human-in-the-loop Learning for Dynamic Congestion Games

Today mobile users learn and share their traffic observations via crowdsourcing platforms (e.g., Waze). Yet such platforms simply cater to selfish users' myopic interests to recommend the shortest path, and do not encourage enough users to travel and learn other paths for future others. Prior studies focus on one-shot congestion games without considering users' information learning, while our work studies how users learn and alter traffic conditions on stochastic paths in a human-in-the-loop manner. Our analysis shows that the myopic routing policy leads to severe under-exploration of stochastic paths. This results in a price of anarchy (PoA) greater than $2$, as compared to the socially optimal policy in minimizing the long-term social cost. Besides, the myopic policy fails to ensure the correct learning convergence about users' traffic hazard beliefs. To address this, we focus on informational (non-monetary) mechanisms as they are easier to implement than pricing. We first show that existing information-hiding mechanisms and deterministic path-recommendation mechanisms in Bayesian persuasion literature do not work with even (\text{PoA}=\infty). Accordingly, we propose a new combined hiding and probabilistic recommendation (CHAR) mechanism to hide all information from a selected user group and provide state-dependent probabilistic recommendations to the other user group. Our CHAR successfully ensures PoA less than (\frac{5}{4}), which cannot be further reduced by any other informational (non-monetary) mechanism. Besides the parallel network, we further extend our analysis and CHAR to more general linear path graphs with multiple intermediate nodes, and we prove that the PoA results remain unchanged. Additionally, we carry out experiments with real-world datasets to further extend our routing graphs and verify the close-to-optimal performance of our CHAR.

翻译：如今，移动用户通过众包平台（如Waze）学习和分享其交通观测信息。然而，此类平台仅迎合自私用户的短视利益以推荐最短路径，并未充分鼓励用户探索其他路径以供未来他人使用。既有研究聚焦于单次拥塞博弈而未考虑用户的信息学习过程，而本文则研究了用户如何在人机协同模式下学习并改变随机路径上的交通状况。我们的分析表明，短视路径选择策略会导致对随机路径的严重探索不足。与最小化长期社会成本的最优社会策略相比，该策略会导致大于2的无政府价格（PoA）。此外，短视策略无法确保用户对交通风险信念的正确学习收敛。为解决这一问题，我们聚焦于信息性（非货币）机制，因其相较于定价机制更易实施。我们首先证明，现有贝叶斯说服文献中的信息隐藏机制和确定性路径推荐机制甚至会导致无限大的PoA（PoA=∞）。据此，我们提出一种新型的隐藏与概率推荐组合机制（CHAR），该机制对选定用户组隐藏所有信息，并向另一用户组提供依赖状态的概率路径推荐。我们的CHAR机制成功将PoA控制在小于5/4的范围内，且任何其他信息性（非货币）机制均无法进一步降低该值。除并联网络外，我们还将分析与CHAR扩展至包含多个中间节点的更一般线性路径图，并证明PoA结果保持不变。此外，我们利用真实数据集进行实验，进一步扩展路径图并验证CHAR接近最优的性能表现。