We study the game modification problem, where a benevolent game designer or a malevolent adversary modifies the reward function of a zero-sum Markov game so that a target deterministic or stochastic policy profile becomes the unique Markov perfect Nash equilibrium and has a value within a target range, in a way that minimizes the modification cost. We characterize the set of policy profiles that can be installed as the unique equilibrium of some game, and establish sufficient and necessary conditions for successful installation. We propose an efficient algorithm, which solves a convex optimization problem with linear constraints and then performs random perturbation, to obtain a modification plan with a near-optimal cost.
翻译:我们研究博弈修改问题,其中善意的博弈设计者或恶意的对手修改零和马尔可夫博弈的奖励函数,使得某个确定的或随机的策略配置文件成为唯一的马尔可夫完美纳什均衡,并且其值函数落在目标区间内,同时最小化修改代价。我们刻画了能够被安装为某个博弈唯一均衡的策略配置文件的集合,并建立了成功安装的充分必要条件。我们提出了一种高效算法,该算法先求解一个带有线性约束的凸优化问题,然后执行随机扰动,以获取一个代价接近最优的修改方案。