In this paper, we consider a learning problem among non-cooperative agents interacting in a time-varying system. Specifically, we focus on repeated linear quadratic network games, in which the network of interactions changes with time and agents may not be present at each iteration. To get tractability, we assume that at each iteration, the network of interactions is sampled from an underlying random network model and agents participate at random with a given probability. Under these assumptions, we consider a gradient-based learning algorithm and establish almost sure convergence of the agents' strategies to the Nash equilibrium of the game played over the expected network. Additionally, we prove, in the large population regime, that the learned strategy is an $\epsilon$-Nash equilibrium for each stage game with high probability. We validate our results over an online market application.
翻译:本文研究非合作智能体在时变系统中交互的学习问题。具体而言,我们聚焦于重复进行的线性二次网络博弈,其中交互网络随时间变化且智能体可能未参与每次迭代。为获得可解性,我们假设每次迭代中交互网络从底层随机网络模型中采样,且智能体以给定概率随机参与。在此假设下,我们考虑基于梯度的学习算法,并证明智能体策略几乎必然收敛至期望网络博弈的纳什均衡。此外,我们在大规模人口场景下证明,学习策略以高概率成为每阶段博弈的$\epsilon$-纳什均衡。我们通过在线市场应用验证了结论。