This paper introduces the new concept of (follower) satisfaction in Stackelberg games and compares the standard Stackelberg game with its satisfaction version. Simulation results are presented which suggest that the follower adopting satisfaction generally increases leader utility. This important new result is proven for the case where leader strategies to commit to are restricted to be deterministic (pure strategies). The paper then addresses the application of regret based algorithms to the Stackelberg problem. Although it is known that the follower adopts a no-regret position in a Stackelberg solution, this is not generally the case for the leader. The report examines the convergence behaviour of unconditional and conditional regret matching (RM) algorithms in the Stackelberg setting. The paper shows that, in the examples considered, that these algorithms either converge to any pure Nash equilibria for the simultaneous move game, or to some mixed strategies which do not have the "no-regret" property. In one case, convergence of the conditional RM algorithm over both players to a solution "close" to the Stackelberg case was observed. The paper argues that further research in this area, in particular when applied in the satisfaction setting could be fruitful.
翻译:本文引入了斯塔克尔伯格博弈中(追随者)满意度的新概念,并将标准斯塔克尔伯格博弈与其满意度版本进行了比较。仿真结果表明,采用满意度策略的追随者通常会提高领导者的效用。这一重要新结果在领导者承诺的策略被限制为确定性(纯策略)的情况下得到了证明。随后,本文探讨了基于遗憾的算法在斯塔克尔伯格问题中的应用。尽管已知在斯塔克尔伯格解中追随者会采取无遗憾立场,但这通常不适用于领导者。报告考察了无条件与条件遗憾匹配算法在斯塔克尔伯格设定中的收敛行为。研究表明,在所考虑的示例中,这些算法要么收敛于同时行动博弈的任何纯纳什均衡,要么收敛于某些不具备"无遗憾"性质的混合策略。在一种情况下,观察到条件遗憾匹配算法在双方玩家上收敛到一个"接近"斯塔克尔伯格解的结果。本文认为,该领域的进一步研究,特别是在满意度设定中的应用,可能具有重要价值。