In an infinitely repeated general-sum pricing game, independent reinforcement learners may exhibit collusive behavior without any communication, raising concerns about algorithmic collusion. To better understand the learning dynamics, we incorporate agents' relative performance (RP) among competitors using experience replay (ER) techniques. Experimental results indicate that RP considerations play a critical role in long-run outcomes. Agents that are averse to underperformance converge to the Bertrand-Nash equilibrium, while those more tolerant of underperformance tend to charge supra-competitive prices. This finding also helps mitigate the overfitting issue in independent Q-learning. Additionally, the impact of relative ER varies with the number of agents and the choice of algorithms.
翻译:在无限重复的一般和定价博弈中,独立的强化学习者在无需任何通信的情况下可能表现出合谋行为,这引发了关于算法合谋的担忧。为了更好地理解学习动态,我们利用经验回放技术引入了智能体在竞争者中的相对绩效考量。实验结果表明,相对绩效考量在长期结果中起着关键作用。对表现不佳持厌恶态度的智能体会收敛至伯特兰-纳什均衡,而对表现不佳容忍度较高的智能体则倾向于设定超竞争价格。这一发现也有助于缓解独立Q学习中的过拟合问题。此外,相对经验回放的影响会随智能体数量和算法选择的不同而变化。