Incentive design problems consider a system planner who steers self-interested agents toward a socially optimal Nash equilibrium by issuing incentives in the presence of information asymmetry, that is, uncertainty about the agents' cost functions. A common approach formulates the problem as a Mathematical Program with Equilibrium Constraints (MPEC) and optimizes incentives using hypergradients-the total derivatives of the planner's objective with respect to incentives. However, computing or approximating the hypergradients typically requires full or partial knowledge of equilibrium sensitivities to incentives, which is generally unavailable under information asymmetry. In this paper, we propose a hypergradient-free incentive law, called the social-gradient flow, for incentive design when the planner's social cost depends on the agents' joint actions. We prove that the social cost gradient is always a descent direction for the planner's objective, irrespective of the agent cost landscape. In the idealized setting where equilibrium responses are observable, the social-gradient flow converges to the unique socially optimal incentive. When equilibria are not directly observable, the social-gradient flow emerges as the slow-timescale limit of a two-timescale interaction, in which agents' strategies evolve on a faster timescale. It is established that the joint strategy-incentive dynamics converge to the social optimum for any agent learning rule that asymptotically tracks the equilibrium. Theoretical results are also validated via numerical experiments.
翻译:激励设计问题考虑一个系统规划者,在信息不对称(即对智能体成本函数的不确定性)下,通过发布激励措施引导自私智能体趋向社会最优纳什均衡。常见的方法是将该问题建模为带均衡约束的数学规划(MPEC),并使用超梯度——规划者目标函数对激励措施的全导数——来优化激励。然而,计算或近似超梯度通常需要完全或部分了解均衡对激励措施的敏感性,这在信息不对称条件下通常不可得。本文针对规划者社会成本取决于智能体联合行动的场景,提出了一种无超梯度的激励法则,称为社会梯度流。我们证明,无论智能体成本景观如何,社会成本梯度始终是规划者目标函数的下降方向。在均衡响应可观测的理想化设定下,社会梯度流收敛于唯一的社会最优激励。当均衡不可直接观测时,社会梯度流表现为双时间尺度交互中的慢时间尺度极限,其中智能体策略在较快时间尺度上演化。结果证明,对于任何渐近跟踪均衡的智能体学习规则,联合策略-激励动态均收敛于社会最优。理论结果也通过数值实验得到了验证。