We analyze independent policy-gradient (PG) learning in $N$-player linear-quadratic (LQ) stochastic differential games. Each player employs a distributed policy that depends only on its own state and updates the policy independently using the gradient of its own objective. We establish global linear convergence of these methods to an equilibrium by showing that the LQ game admits an $α$-potential structure, with $α$ determined by the degree of pairwise interaction asymmetry. For pairwise-symmetric interactions, we construct an affine distributed equilibrium by minimizing the potential function and show that independent PG methods converge globally to this equilibrium, with complexity scaling linearly in the population size and logarithmically in the desired accuracy. For asymmetric interactions, we prove that independent projected PG algorithms converge linearly to an approximate equilibrium, with suboptimality proportional to the degree of asymmetry. Numerical experiments confirm the theoretical results across both symmetric and asymmetric interaction networks.
翻译:我们分析了$N$人线性二次(LQ)随机微分博弈中的独立策略梯度(PG)学习。每个参与者采用仅依赖于其自身状态的分布式策略,并使用其自身目标函数的梯度独立更新策略。通过证明该LQ博弈具有α-势结构(其中α由成对交互不对称程度决定),我们确立了这些方法能全局线性收敛至某个均衡。对于成对对称的交互,我们通过最小化势函数构造了一个仿射分布式均衡,并证明独立PG方法能全局收敛至该均衡,其计算复杂度随种群规模线性增长、随期望精度对数增长。对于非对称交互,我们证明独立投影PG算法能线性收敛至近似均衡,其次优性与不对称程度成正比。数值实验在对称与非对称交互网络上均验证了理论结果。