Blackwell's approachability is a very general sequential decision framework where a Decision Maker obtains vector-valued outcomes, and aims at the convergence of the average outcome to a given "target" set. Blackwell gave a sufficient condition for the decision maker having a strategy guaranteeing such a convergence against an adversarial environment, as well as what we now call the Blackwell's algorithm, which then ensures convergence. Blackwell's approachability has since been applied to numerous problems, in online learning and game theory, in particular. We extend this framework by allowing the outcome function and the dot product to be time-dependent. We establish a general guarantee for the natural extension to this framework of Blackwell's algorithm. In the case where the target set is an orthant, we present a family of time-dependent dot products which yields different convergence speeds for each coordinate of the average outcome. We apply this framework to the Big Match (one of the most important toy examples of stochastic games) where an $\epsilon$-uniformly optimal strategy for Player I is given by Blackwell's algorithm in a well-chosen auxiliary approachability problem.
翻译:布莱克威尔可逼近性是一种极为通用的序贯决策框架,其中决策者获得向量值结果,并致力于使平均结果收敛至给定"目标"集。布莱克威尔给出了决策者在对抗环境下保证此类收敛策略的充分条件,以及如今称为"布莱克威尔算法"的方法,从而确保收敛。此后,布莱克威尔可逼近性被广泛应用于在线学习与博弈论等众多问题中。本研究通过允许结果函数与点积具有时变性来扩展该框架。我们为布莱克威尔算法在此框架下的自然扩展建立了通用保证。当目标集为象限时,我们提出了一族时变点积,使得平均结果的每个坐标具有不同的收敛速度。我们将此框架应用于大博弈(随机博弈中最重要的经典案例之一),在该博弈中,通过精心选择的辅助可逼近性问题,布莱克威尔算法为参与者一提供了$\epsilon$-均匀最优策略。