It is shown in recent studies that in a Stackelberg game the follower can manipulate the leader by deviating from their true best-response behavior. Such manipulations are computationally tractable and can be highly beneficial for the follower. Meanwhile, they may result in significant payoff losses for the leader, sometimes completely defeating their first-mover advantage. A warning to commitment optimizers, the risk these findings indicate appears to be alleviated to some extent by a strict information advantage the manipulations rely on. That is, the follower knows the full information about both players' payoffs whereas the leader only knows their own payoffs. In this paper, we study the manipulation problem with this information advantage relaxed. We consider the scenario where the follower is not given any information about the leader's payoffs to begin with but has to learn to manipulate by interacting with the leader. The follower can gather necessary information by querying the leader's optimal commitments against contrived best-response behaviors. Our results indicate that the information advantage is not entirely indispensable to the follower's manipulations: the follower can learn the optimal way to manipulate in polynomial time with polynomially many queries of the leader's optimal commitment.
翻译:近期研究表明,在斯塔克尔伯格博弈中,追随者可以通过偏离其真实最优反应行为来操控领导者。这种操控在计算上易于实现,且能为追随者带来显著收益。与此同时,它们可能导致领导者遭受重大收益损失,有时甚至会完全抵消其先发优势。作为对承诺优化器的警示,这些发现所揭示的风险似乎在一定程度上被操控所依赖的严格信息优势所缓解。也就是说,追随者掌握关于双方收益的全部信息,而领导者仅知晓自身收益。本文研究了在放松这一信息优势条件下的操控问题。我们考虑如下场景:追随者起初不了解领导者收益的任何信息,但必须通过与领导者的交互来学习操控。追随者可以通过针对人为设计的最优反应行为查询领导者的最优承诺来收集必要信息。我们的结果表明,信息优势并非追随者实施操控的绝对必要条件:追随者能够通过多项式次数的领导者最优承诺查询,在多项式时间内学习到最优操控方式。