We propose an adaptive incentive mechanism that learns the optimal incentives in environments where players continuously update their strategies. Our mechanism updates incentives based on each player's externality, defined as the difference between the player's marginal cost and the operator's marginal cost at each time step. The proposed mechanism updates the incentives on a slower timescale compared to the players' learning dynamics, resulting in a two-timescale coupled dynamical system. Notably, this mechanism is agnostic to the specific learning dynamics used by players to update their strategies. We show that any fixed point of this adaptive incentive mechanism corresponds to the optimal incentive mechanism, ensuring that the Nash equilibrium coincides with the socially optimal strategy. Additionally, we provide sufficient conditions under which the adaptive mechanism converges to a fixed point. Our results apply to both atomic and non-atomic games. To demonstrate the effectiveness of our proposed mechanism, we verify the convergence conditions in two practically relevant classes of games: atomic aggregative games and non-atomic routing games.
翻译:本文提出一种自适应激励机制,用于在参与者持续更新策略的环境中学习最优激励。该机制根据每个参与者的外部性(定义为每个时间步参与者边际成本与运营者边际成本之间的差异)更新激励。所提出的激励机制更新频率低于参与者的学习动态,从而形成一个双时间尺度耦合动力系统。值得注意的是,该机制不依赖于参与者更新策略的具体学习动态。我们证明该自适应激励机制的任意不动点均对应最优激励机制,确保纳什均衡与社会最优策略相一致。此外,我们给出了自适应机制收敛至不动点的充分条件。我们的研究结果同时适用于原子博弈与非原子博弈。为验证所提出机制的有效性,我们在两类具有实际意义的博弈中验证了收敛条件:原子聚合博弈与非原子路径选择博弈。