This paper focuses on the online saddle point problem, which involves a sequence of two-player time-varying convex-concave games. Considering the nonstationarity of the environment, we adopt the duality gap and the dynamic Nash equilibrium regret as performance metrics for algorithm design. We present three variants of the proximal point method: the Online Proximal Point Method~(OPPM), the Optimistic OPPM~(OptOPPM), and the OptOPPM with multiple predictors. Each algorithm guarantees upper bounds for both the duality gap and dynamic Nash equilibrium regret, achieving near-optimality when measured against the duality gap. Specifically, in certain benign environments, such as sequences of stationary payoff functions, these algorithms maintain a nearly constant metric bound. Experimental results further validate the effectiveness of these algorithms. Lastly, this paper discusses potential reliability concerns associated with using dynamic Nash equilibrium regret as a performance metric.
翻译:本文聚焦于在线鞍点问题,该问题涉及一系列时变双人凸凹博弈。考虑到环境的非平稳性,我们采用对偶间隙和动态纳什均衡遗憾作为算法设计的性能度量。我们提出了三种近端点方法的变体:在线近端点方法(OPPM)、乐观OPPM(OptOPPM)以及带有多预测器的OptOPPM。每种算法均保证了对偶间隙和动态纳什均衡遗憾的上界,并在以对偶间隙衡量时达到了近似最优性。具体而言,在某些良性环境(如平稳支付函数序列)中,这些算法能保持近乎恒定的度量界。实验结果进一步验证了这些算法的有效性。最后,本文讨论了使用动态纳什均衡遗憾作为性能度量时可能存在的可靠性问题。