In this paper, we consider a learning problem among non-cooperative agents interacting in a time-varying system. Specifically, we focus on repeated linear quadratic network games, in which the network of interactions changes with time and agents may not be present at each iteration. To get tractability, we assume that at each iteration, the network of interactions is sampled from an underlying random network model and agents participate at random with a given probability. Under these assumptions, we consider a gradient-based learning algorithm and establish almost sure convergence of the agents' strategies to the Nash equilibrium of the game played over the expected network. Additionally, we prove, in the large population regime, that the learned strategy is an $\epsilon$-Nash equilibrium for each stage game with high probability. We validate our results over an online market application.
翻译:本文考虑非合作智能体在时变系统中交互的学习问题。具体而言,我们聚焦于重复进行的线性二次网络博弈,其中交互网络随时间变化,且智能体可能无法参与每次迭代。为获得可处理性,我们假设每次迭代中交互网络从底层随机网络模型中采样,智能体以给定概率随机参与。在这些假设下,我们研究基于梯度的学习算法,并证明智能体策略几乎必然收敛到预期网络上博弈的纳什均衡。此外,在大种群条件下,我们证明学习策略以高概率成为每阶段博弈的ε-纳什均衡。我们通过在线市场应用验证了结果。