We simulate behaviour of two independent reinforcement learning algorithms playing the Crawford and Sobel (1982) game of strategic information transmission. We adopt memoryless algorithms to capture learning in a static game where a large population interacts anonymously. We show that sender and receiver converge to Nash equilibrium play. The level of informativeness of the sender's cheap talk decreases as the bias increases and, at intermediate level of the bias, it matches the level predicted by the Pareto optimal equilibrium or by the second best one. Conclusions are robust to alternative specifications of the learning hyperparameters and of the game.
翻译:我们模拟了两个独立的强化学习算法在克劳福德和索贝尔(1982)的战略信息传递博弈中的行为。我们采用无记忆算法来捕捉静态博弈中的学习过程,其中大量参与者进行匿名互动。研究表明,发送方与接收方的行为均收敛至纳什均衡策略。随着偏好偏差的增加,发送方廉价对话的信息量逐渐降低;在中等偏差水平下,其信息量匹配帕累托最优均衡或次优均衡的预测值。该结论对学习超参数及博弈设定的不同变体均保持稳健。