While discounted payoff games and classic games that reduce to them, like parity and mean-payoff games, are symmetric, their solutions are not. We have taken a fresh view on the properties that optimal solutions need to have, and devised a novel way to converge to them, which is entirely symmetric. We achieve this by building a constraint system that uses every edge to define an inequation, and update the objective function by taking a single outgoing edge for each vertex into account. These edges loosely represent strategies of both players, where the objective function intuitively asks to make the inequation to these edges sharp, leading to an `error' or 0. For co-optimal strategies, and only for them, this can be achieved, and while we have not found them, we step-wise improve the error by improving the solution for a given objective function or by improving the objective function for a given solution. This also challenges the gospel that methods for solving payoff games are either based on strategy improvement or on value iteration.
翻译:尽管贴现收益博弈及其简化的经典博弈(如奇偶博弈和均值收益博弈)具有对称性,但它们的求解方法却并非如此。我们重新审视了最优解应具备的性质,并设计了一种全新的、完全对称的收敛方法。为此,我们构建了一个约束系统,利用每条边定义一个不等式,并通过考虑每个顶点的单条出边来更新目标函数。这些边松散地代表了双方的策略,目标函数直观上要求将这些边对应的不等式变为严格等式,从而产生“误差”或0。对于合作最优策略(且仅对这些策略),我们可以实现这一目标。在尚未找到这些策略时,我们通过以下方式逐步改进误差:针对给定目标函数改进解,或针对给定解改进目标函数。这一工作还挑战了以下传统观点:求解收益博弈的方法要么基于策略改进,要么基于值迭代。