While discounted payoff games and classic games that reduce to them, like parity and mean-payoff games, are symmetric, their solutions are not. We have taken a fresh view on the properties that optimal solutions need to have, and devised a novel way to converge to them, which is entirely symmetric. We achieve this by building a constraint system that uses every edge to define an inequation, and update the objective function by taking a single outgoing edge for each vertex into account. These edges loosely represent strategies of both players, where the objective function intuitively asks to make the inequation to these edges sharp. In fact, where they are not sharp, there is an `error' represented by the difference between the two sides of the inequation, which is 0 where the inequation is sharp. Hence, the objective is to minimise the sum of these errors. For co-optimal strategies, and only for them, it can be achieved that all selected inequations are sharp or, equivalently, that the sum of these errors is zero. While no co-optimal strategies have been found, we step-wise improve the error by improving the solution for a given objective function or by improving the objective function for a given solution. This also challenges the gospel that methods for solving payoff games are either based on strategy improvement or on value iteration.
翻译:尽管折扣收益博弈及其可归约的经典博弈(如奇偶博弈和平均收益博弈)具有对称性,但其解却不对称。我们重新审视了最优解所需具备的性质,并设计了一种全新的收敛方法,该方法完全对称。我们通过构建一个约束系统来实现这一点,该系统利用每条边定义一个不等式,并通过考虑每个顶点的单条出边来更新目标函数。这些边松散地表示双方玩家的策略,而目标函数直观地要求使这些边对应的不等式变为严格等式。实际上,在不等式不严格成立的地方存在一个“误差”,由不等式两侧的差值表示;当不等式严格成立时,该误差为零。因此,目标是最小化这些误差的总和。对于协同最优策略——且仅对于此类策略——可以实现所有选定不等式均为严格等式,或等价地,使这些误差之和为零。在尚未找到协同最优策略的情况下,我们通过改进给定目标函数的解或改进给定解的目标函数,逐步减小误差。这也对“求解收益博弈的方法要么基于策略改进,要么基于值迭代”这一传统观念提出了挑战。