Differentiable Bilevel Programming for Stackelberg Congestion Games

In a Stackelberg congestion game (SCG), a leader aims to maximize their own gain by anticipating and manipulating the equilibrium state at which the followers settle by playing a congestion game. Often formulated as bilevel programs, large-scale SCGs are well known for their intractability and complexity. Here, we attempt to tackle this computational challenge by marrying traditional methodologies with the latest differentiable programming techniques in machine learning. The core idea centers on replacing the lower-level equilibrium problem with a smooth evolution trajectory defined by the imitative logit dynamic (ILD), which we prove converges to the equilibrium of the congestion game under mild conditions. Building upon this theoretical foundation, we propose two new local search algorithms for SCGs. The first is a gradient descent algorithm that obtains the derivatives by unrolling ILD via differentiable programming. Thanks to the smoothness of ILD, the algorithm promises both efficiency and scalability. The second algorithm adds a heuristic twist by cutting short the followers' evolution trajectory. Behaviorally, this means that, instead of anticipating the followers' best response at equilibrium, the leader seeks to approximate that response by only looking ahead a limited number of steps. Our numerical experiments are carried out over various instances of classic SCG applications, ranging from toy benchmarks to large-scale real-world examples. The results show the proposed algorithms are reliable and scalable local solvers that deliver high-quality solutions with greater regularity and significantly less computational effort compared to the many incumbents included in our study.

翻译：在斯塔克尔伯格拥塞博弈中，领导者通过预测并操纵追随者在拥塞博弈中达到的均衡状态来最大化自身收益。这类问题通常被建模为双层规划，大规模斯塔克尔伯格拥塞博弈因其难解性和复杂性而闻名。本文尝试将传统方法论与机器学习领域最新的可微分编程技术相结合，以应对这一计算挑战。核心思路在于用模仿逻辑动态定义的平滑演化轨迹替代下层均衡问题，我们证明该动态在温和条件下收敛至拥塞博弈的均衡。基于这一理论基础，我们提出了两种针对斯塔克尔伯格拥塞博弈的局部搜索算法。第一种是梯度下降算法，通过可微分编程展开模仿逻辑动态来获取导数。得益于模仿逻辑动态的平滑性，该算法兼具高效性与可扩展性。第二种算法通过截断追随者的演化轨迹引入启发式调整。从行为学角度看，这意味着领导者不再预测追随者在均衡状态下的最优反应，而是仅通过前瞻有限步数来近似该反应。我们在从基准测试到大规模真实世界案例的多种经典斯塔克尔伯格拥塞博弈实例上进行了数值实验。结果表明，与研究中包含的众多现有方法相比，所提算法是可靠且可扩展的局部求解器，能以更显著的计算开销提供质量更高、规律性更强的解决方案。