We simulate behaviour of independent reinforcement learning algorithms playing the Crawford and Sobel (1982) game of strategic information transmission. We show that a sender and a receiver training together converge to strategies approximating the ex-ante optimal equilibrium of the game. Communication occurs to the largest extent predicted by Nash equilibrium. The conclusion is robust to alternative specifications of the learning hyperparameters and of the game. We discuss implications for theories of equilibrium selection in information transmission games, for work on emerging communication among algorithms in computer science, and for the economics of collusions in markets populated by artificially intelligent agents.
翻译:我们模拟了独立强化学习算法在克劳福德和索贝尔(1982)策略性信息传递博弈中的行为。研究表明,发送者与接收者共同训练时会收敛至该博弈的事前最优均衡策略。沟通程度达到纳什均衡预测的最大范围。该结论对学习超参数及博弈的替代性设定均具有稳健性。我们讨论了该研究对信息传递博弈中均衡选择理论、计算机科学中算法间新兴通信研究以及由人工智能主体构成的市场中共谋经济学的启示意义。