Growing concerns about safety and alignment of AI systems highlight the importance of embedding moral capabilities in artificial agents. A promising solution is the use of learning from experience, i.e., Reinforcement Learning. In multi-agent (social) environments, complex population-level phenomena may emerge from interactions between individual learning agents. Many of the existing studies rely on simulated social dilemma environments to study the interactions of independent learning agents. However, they tend to ignore the moral heterogeneity that is likely to be present in societies of agents in practice. For example, at different points in time a single learning agent may face opponents who are consequentialist (i.e., caring about maximizing some outcome over time) or norm-based (i.e., focusing on conforming to a specific norm here and now). The extent to which agents' co-development may be impacted by such moral heterogeneity in populations is not well understood. In this paper, we present a study of the learning dynamics of morally heterogeneous populations interacting in a social dilemma setting. Using a Prisoner's Dilemma environment with a partner selection mechanism, we investigate the extent to which the prevalence of diverse moral agents in populations affects individual agents' learning behaviors and emergent population-level outcomes. We observe several types of non-trivial interactions between pro-social and anti-social agents, and find that certain classes of moral agents are able to steer selfish agents towards more cooperative behavior.
翻译:日益增长的关于AI系统安全性与对齐性的担忧,凸显了在人工智能体中嵌入道德能力的重要性。一种有前景的解决方案是使用经验学习,即强化学习。在多智能体(社交)环境中,个体学习智能体之间的交互可能涌现出复杂的群体层面现象。现有研究多依赖模拟社交困境环境来研究独立学习智能体的交互,但往往忽略了实际智能体群体中可能存在的道德异质性。例如,单个学习智能体在不同时间点可能面临结果主义(即注重随时间推移最大化某种结果)或规范主义(即关注在此刻遵守特定规范)的对手。这种道德异质性对群体中智能体共同发展的影响程度尚不清晰。本文研究了在社交困境环境中交互的道德异质性群体的学习动力学。利用带有伙伴选择机制的囚徒困境环境,我们探究了群体中不同道德智能体的占比如何影响个体学习智能体的行为及涌现的群体层面结果。我们观察到亲社会与反社会智能体之间存在多种类型的非平凡交互,并发现某些类别的道德智能体能够引导自私智能体转向更合作的行为。