We propose an algorithm to solve a class of Stackelberg games (possibly with multiple followers) in a follower agnostic manner. Particularly, unlike other contemporary works, our algorithm does not require the use of an oracle estimator for the gradient of the leader's objective or knowledge about the follower's utility function or strategy space. Instead, we design two-loop algorithm where the leader updates its strategies using specially constructed gradient estimator obtained by probing followers with specially designed strategies. Upon receiving the followers engage in an adaptation rule such that the joint strategy of followers converges near equilibrium which is the only information observed by leader to construct the aforementioned gradient estimator. We provide non-asymptotic convergence rates to stationary points of the leader's objective in the absence of convexity of the closed-loop function and further show asymptotic convergence to a local minima of the leader's objective.
翻译:我们提出了一种算法,用于以跟随者无关的方式求解一类Stackelberg博弈(可能包含多个跟随者)。与现有其他工作不同,我们的算法无需使用领导者目标函数梯度的预言估计器,也无需了解跟随者的效用函数或策略空间。具体而言,我们设计了一个双循环算法:领导者通过探测跟随者并采用专门设计的策略来构建梯度估计器。跟随者在收到策略后遵循一种自适应规则,使得所有跟随者的联合策略收敛至均衡附近——领导者仅观测此均衡结果来构建前述梯度估计器。我们证明了在闭环函数非凸的情况下领导者目标函数驻点处的非渐近收敛率,并进一步证明了领导者目标函数局部最小值的渐近收敛性。