We simulate behaviour of two independent reinforcement learning algorithms playing the Crawford and Sobel (1982) game of strategic information transmission. We adopt memoryless algorithms to capture learning in a static game where a large population interacts anonymously. We show that sender and receiver converge to Nash equilibrium play. The level of informativeness of the sender's cheap talk decreases as the bias increases and, at intermediate level of the bias, it matches the level predicted by the Pareto optimal equilibrium or by the second best one. Conclusions are robust to alternative specifications of the learning hyperparameters and of the game.
翻译:我们模拟了两种独立强化学习算法在克劳福德与索贝尔(1982)战略信息传递博弈中的行为。我们采用无记忆算法来刻画静态博弈中大规模匿名群体的学习过程。研究表明,发送方与接收方的行为均收敛至纳什均衡策略。随着偏好偏差增大,发送方廉价对话的信息含量逐渐降低;在中等偏差水平下,其信息含量与帕累托最优均衡或次优均衡的预测水平相符。该结论对学习超参数及博弈设定的不同变体均保持稳健。