We simulate behavior of independent reinforcement learning algorithms playing the Crawford and Sobel (1982) game of strategic information transmission. We show that a sender and a receiver training together converge to strategies close to the ex-ante optimal equilibrium of the game. Hence, communication takes place to the largest extent predicted by Nash equilibrium. The conclusion is robust to alternative specifications of the learning hyperparameters and of the game. We discuss implications for theories of equilibrium selection in information transmission games, for work on emerging communication among algorithms in computer science, and for the economics of collusions in markets populated by artificially intelligent agents.
翻译:我们模拟了独立强化学习算法在克劳福德-索贝尔(1982)战略信息传递博弈中的行为。研究表明,发送方与接收方在共同训练中收敛至接近博弈事前最优均衡的策略。因此,通信程度达到了纳什均衡所预测的最大范围。这一结论对于学习超参数设定和博弈结构的替代形式均具有稳健性。我们讨论了该结果对信息传递博弈中均衡选择理论、计算机科学领域新兴的算法间通信研究,以及由人工智能代理构成的市场中合谋经济学的启示。