Autonomous pricing agents are widely deployed in online marketplaces, making algorithmic pricing a prominent application of multi-agent learning. Experimental studies often report collusive outcomes, but these findings typically rely on Q-learning in complete-information environments and lack rigorous convergence guarantees. In this paper, we study the stochastic learning dynamics of Regularized Robbins-Monro (RRM) algorithms in a Bayesian Bertrand competition with private costs. We show that this setting violates standard stability conditions, including monotonicity and the Minty variational inequality, rendering classical convergence results for gradient-based learning inapplicable. Despite this, we prove that Euclidean RRM algorithms converge almost surely to the unique, efficient Bayes-Nash equilibrium within a finite-dimensional approximation of the strategy space. By analyzing symmetric piecewise-linear pricing strategies in a duopoly, we explicitly construct a global Lyapunov function for the projected primal dynamics and establish global asymptotic stability of the equilibrium. Our analysis yields rigorous convergence guarantees for stochastic first-order learning algorithms in Bayesian Bertrand competition and provides a principled counterpoint to widespread claims of algorithmic collusion.
翻译:自主定价智能体已广泛部署于在线市场,使得算法定价成为多智能体学习的重要应用。实验研究常报告共谋结果,但这些发现通常依赖完全信息环境下的Q学习,且缺乏严格的收敛性保证。本文研究贝叶斯伯特兰竞争(含私人成本)中正则化罗宾斯-蒙罗(RRM)算法的随机学习动力学。我们证明,该设定违背了单调性与明蒂变分不等式等标准稳定性条件,使得基于梯度的经典收敛结果失效。尽管如此,我们证明在策略空间的有限维近似内,欧几里得RRM算法几乎必然收敛至唯一的、有效的贝叶斯-纳什均衡。通过分析双头垄断中的对称分段线性定价策略,我们显式构造了投影原始动力学的全局李雅普诺夫函数,并建立了均衡的全局渐近稳定性。本文分析为贝叶斯伯特兰竞争中的随机一阶学习算法提供了严格的收敛性保证,并对普遍的算法共谋论断给出了原则性的反驳。