This work studies the parameter identification problem of a generalized non-cooperative game, where each player's cost function is influenced by an observable signal and some unknown parameters. We consider the scenario where equilibrium of the game at some observable signals can be observed with noises, whereas our goal is to identify the unknown parameters with the observed data. Assuming that the observable signals and the corresponding noise-corrupted equilibriums are acquired sequentially, we construct this parameter identification problem as online optimization and introduce a novel online parameter identification algorithm. To be specific, we construct a regularized loss function that balances conservativeness and correctiveness, where the conservativeness term ensures that the new estimates do not deviate significantly from the current estimates, while the correctiveness term is captured by the Karush-Kuhn-Tucker conditions. We then prove that when the players' cost functions are linear with respect to the unknown parameters and the learning rate of the online parameter identification algorithm satisfies \mu_k \propto 1/\sqrt{k}, along with other assumptions, the regret bound of the proposed algorithm is O(\sqrt{K}). Finally, we conduct numerical simulations on a Nash-Cournot problem to demonstrate that the performance of the online identification algorithm is comparable to that of the offline setting.
翻译:本文研究广义非合作博弈中的参数辨识问题,其中每位玩家的成本函数受可观测信号和若干未知参数影响。我们考虑在可观测信号下能观测到带有噪声的博弈均衡场景,目标是利用观测数据辨识未知参数。假设可观测信号及其对应的含噪均衡数据顺序获取,我们将该参数辨识问题构建为在线优化问题,并提出一种新颖的在线参数辨识算法。具体而言,我们构建了平衡保守性与修正性的正则化损失函数:保守性项确保新估计值不会过度偏离当前估计值,修正性项则通过Karush-Kuhn-Tucker条件表征。随后证明,当玩家成本函数关于未知参数呈线性关系,且在线参数辨识算法的学习率满足 μ_k ∝ 1/√k 以及其他假设条件时,所提算法的遗憾界为 O(√K)。最后,我们通过一个Nash-Cournot问题的数值仿真,验证了在线辨识算法的性能与离线设置相当。