We simulate behaviour of two independent reinforcement learning algorithms playing the Crawford and Sobel (1982) game of strategic information transmission. We adopt memoryless algorithms to capture learning in a static game where a large population interacts anonymously. We show that sender and receiver converge to Nash equilibrium play. The level of informativeness of the sender's cheap talk decreases as the bias increases and, at intermediate level of the bias, it matches the level predicted by the Pareto optimal equilibrium or by the second best one. Conclusions are robust to alternative specifications of the learning hyperparameters and of the game.
翻译:我们模拟了两个独立的强化学习算法在克劳福德和索贝尔(1982)的战略信息传递博弈中的行为。我们采用无记忆算法来刻画静态博弈中的学习过程,其中大量参与者以匿名方式互动。研究表明,发送方与接收方的行为会收敛至纳什均衡策略。随着偏好偏差的增加,发送方廉价对话的信息传递效率会降低;在中等偏差水平下,其信息传递效率与帕累托最优均衡或次优均衡的预测水平相符。该结论对学习超参数及博弈设定的不同变体均保持稳健。