Two issues of algorithmic collusion are addressed in this paper. First, we show that in a general class of symmetric games, including Prisoner's Dilemma, Bertrand competition, and any (nonlinear) mixture of first and second price auction, only (strict) Nash Equilibrium (NE) is stochastically stable. Therefore, the tacit collusion is driven by failure to learn NE due to insufficient learning, instead of learning some strategies to sustain collusive outcomes. Second, we study how algorithms adapt to collusion in real simulations with insufficient learning. Extensive explorations in early stages and discount factors inflates the Q-value, which interrupts the sequential and alternative price undercut and leads to bilateral rebound. The process is iterated, making the price curves like Edgeworth cycles. When both exploration rate and Q-value decrease, algorithms may bilaterally rebound to relatively high common price level by coincidence, and then get stuck. Finally, we accommodate our reasoning to simulation outcomes in the literature, including optimistic initialization, market design and algorithm design.
翻译:本文探讨了算法合谋的两个核心问题。首先,我们证明,在一类广泛的对称博弈(包括囚徒困境、伯特兰竞争以及任何(非线性)一价与二价拍卖的混合形式)中,只有(严格)纳什均衡是随机稳定的。因此,默示合谋的产生是由于学习不足导致无法习得纳什均衡,而非习得了维持合谋结果的策略。其次,我们研究了在真实模拟中,算法如何在学习不足的情况下适应合谋。早期阶段的大量探索和折扣因子会抬高Q值,这打断了连续且交替的降价行为,并导致双方价格反弹。该过程不断迭代,使得价格曲线呈现出埃奇沃思周期的形态。当探索率和Q值均下降时,算法可能偶然地共同反弹至相对较高的共同价格水平,随后陷入停滞。最后,我们将上述推理与文献中的模拟结果(包括乐观初始化、市场设计和算法设计)进行了适配。