重复价格竞争中的在线优化算法：均衡学习与算法合谋 (Online Optimization Algorithms in Repeated Price Competition: Equilibrium Learning and Algorithmic Collusion)

This paper addresses the question of whether or not uncoupled online learning algorithms converge to the Nash equilibrium in pricing competition or whether they can learn to collude. Algorithmic collusion has been debated among competition regulators, and it is a highly relevant phenomenon for buyers and sellers on online retail platforms. We analyze formally if mean-based algorithms, a class of bandit algorithms relevant to algorithmic pricing, converge to the Nash equilibrium in repeated Bertrand oligopolies. Bandit algorithms only learn the profit of the agent for the price set in each step. In addition, we provide results of extensive experiments with different types of multi-armed bandit algorithms used for algorithmic pricing. In a mathematical proof, we show that mean-based algorithms converge to correlated rational strategy profiles, which coincide with the Nash equilibrium in versions of the Bertrand competition. Learning algorithms do not converge to a Nash equilibrium in general, and the fact that Bertrand pricing games are learnable with bandit algorithms is remarkable. Our numerical results suggest that wide-spread bandit algorithms that are not mean-based also converge to equilibrium and that algorithmic collusion only arises with symmetric implementations of UCB or Q-learning, but not if different algorithms are used by sellers. In addition, the level of supra-competitive prices decreases with increasing numbers of sellers. Supra-competitive prices decrease consumer welfare. If algorithms lead to algorithmic collusion, this is important for consumers, sellers, and regulators to understand. We show that for the important class of multi-armed bandit algorithms such fears are overrated unless all sellers agree on a symmetric implementation of certain collusive algorithms.

翻译：本文探讨了非耦合在线学习算法在定价竞争中是否收敛至纳什均衡，或能否习得合谋行为。算法合谋一直是竞争监管机构争论的焦点，对在线零售平台的买卖双方具有高度现实意义。我们通过形式化分析，研究均值类算法（一类与算法定价相关的多臂赌博机算法）在重复伯川德寡头竞争中的纳什均衡收敛性。赌博机算法仅能获取智能体在每轮设定价格后的利润信息。此外，我们通过大量实验展示了不同类型多臂赌博机算法在算法定价中的应用效果。数学证明表明，均值类算法会收敛至相关理性策略剖面，该剖面与伯川德竞争变体中的纳什均衡相吻合。学习算法通常不收敛至纳什均衡，因此伯川德定价博弈能通过赌博机算法实现学习收敛的现象值得关注。数值实验表明，非均值类的常用赌博机算法同样具有均衡收敛性，而算法合谋仅出现在对称实施的UCB或Q-learning算法中，若卖方采用异构算法则不会产生合谋。此外，超竞争价格水平随卖方数量增加而递减。超竞争价格会损害消费者福利。若算法引发合谋，消费者、卖方与监管机构需充分认知其影响。研究表明，对于重要的多臂赌博机算法类别，除非所有卖方均采用特定合谋算法的对称实施方案，否则相关担忧实属过虑。