In human society, the conflict between self-interest and collective well-being often obstructs efforts to achieve shared welfare. Related concepts like the Tragedy of the Commons and Social Dilemmas frequently manifest in our daily lives. As artificial agents increasingly serve as autonomous proxies for humans, we propose using multi-agent reinforcement learning (MARL) to address this issue - learning policies to maximise collective returns even when individual agents' interests conflict with the collective one. Traditional MARL solutions involve sharing rewards, values, and policies or designing intrinsic rewards to encourage agents to learn collectively optimal policies. We introduce a novel MARL approach based on Suggestion Sharing (SS), where agents exchange only action suggestions. This method enables effective cooperation without the need to design intrinsic rewards, achieving strong performance while revealing less private information compared to sharing rewards, values, or policies. Our theoretical analysis establishes a bound on the discrepancy between collective and individual objectives, demonstrating how sharing suggestions can align agents' behaviours with the collective objective. Experimental results demonstrate that SS performs competitively with baselines that rely on value or policy sharing or intrinsic rewards.
翻译:在人类社会中,个体利益与集体福祉之间的冲突常常阻碍实现共同福利的努力。诸如"公地悲剧"和"社会困境"等相关概念在我们的日常生活中频繁显现。随着人工智能体日益成为人类的自主代理,我们提出使用多智能体强化学习(MARL)来解决这一问题——即使个体智能体的利益与集体利益相冲突,也能学习最大化集体回报的策略。传统的MARL解决方案涉及共享奖励、价值函数和策略,或设计内在奖励以鼓励智能体学习集体最优策略。我们提出了一种基于建议共享(SS)的新型MARL方法,其中智能体仅交换行动建议。该方法无需设计内在奖励即可实现有效合作,在取得优异性能的同时,相较于共享奖励、价值函数或策略的方案能减少私有信息的暴露。我们的理论分析建立了集体目标与个体目标之间差异的界限,证明了共享建议如何使智能体行为与集体目标保持一致。实验结果表明,SS在性能上与依赖价值/策略共享或内在奖励的基线方法具有竞争力。