To take advantage of strategy commitment, a useful tactic of playing games, a leader must learn enough information about the follower's payoff function. However, this leaves the follower a chance to provide fake information and influence the final game outcome. Through a carefully contrived payoff function misreported to the learning leader, the follower may induce an outcome that benefits him more, compared to the ones when he truthfully behaves. We study the follower's optimal manipulation via such strategic behaviors in extensive-form games. Followers' different attitudes are taken into account. An optimistic follower maximizes his true utility among all game outcomes that can be induced by some payoff function. A pessimistic follower only considers misreporting payoff functions that induce a unique game outcome. For all the settings considered in this paper, we characterize all the possible game outcomes that can be induced successfully. We show that it is polynomial-time tractable for the follower to find the optimal way of misreporting his private payoff information. Our work completely resolves this follower's optimal manipulation problem on an extensive-form game tree.
翻译:为利用策略承诺这一有效的博弈策略优势,领导者必须获取足够多的关于追随者收益函数的信息。然而,这给追随者提供了提供虚假信息并影响最终博弈结果的机会。通过精心编造并向学习型领导者误报收益函数,追随者可能诱导出一个比其真实行为时更有利的结果。我们研究扩展式博弈中追随者通过此类策略行为进行的最优操纵。本文考虑了追随者的不同态度。乐观追随者在其所有可通过某种收益函数诱导出的博弈结果中最大化其真实效用。悲观追随者仅考虑那些能诱导出唯一博弈结果的误报收益函数。针对本文考虑的所有设定,我们刻画了所有能被成功诱导的博弈结果。我们证明,追随者找到其私人收益信息的最优误报方式是多项式时间可解的。我们的工作完整解决了扩展式博弈树上这一追随者最优操纵问题。