Growing concerns about safety and alignment of AI systems highlight the importance of embedding moral capabilities in artificial agents: a promising solution is the use of learning from experience, i.e., Reinforcement Learning. In multi-agent (social) environments, complex population-level phenomena may emerge from interactions between individual learning agents. Many of the existing studies rely on simulated social dilemma environments to study the interactions of independent learning agents; however, they tend to ignore the moral heterogeneity that is likely to be present in societies of agents in practice. For example, at different points in time a single learning agent may face opponents who are consequentialist (i.e., focused on maximizing outcomes over time), norm-based (i.e., conforming to specific norms), or virtue-based (i.e., considering a combination of different virtues). The extent to which agents' co-development may be impacted by such moral heterogeneity in populations is not well understood. In this paper, we present a study of the learning dynamics of morally heterogeneous populations interacting in a social dilemma setting. Using an Iterated Prisoner's Dilemma environment with a partner selection mechanism, we investigate the extent to which the prevalence of diverse moral agents in populations affects individual agents' learning behaviors and emergent population-level outcomes. We observe several types of non-trivial interactions between pro-social and anti-social agents, and find that certain types of moral agents are able to steer selfish agents towards more cooperative behavior.
翻译:日益增长的对人工智能系统安全性与对齐性的关切凸显了在人工智能体中嵌入道德能力的重要性:一种有前景的解决方案是利用从经验中学习,即强化学习。在多智能体(社会)环境中,复杂的群体层面现象可能从个体学习智能体之间的互动中涌现。现有许多研究依赖于模拟社会困境环境来研究独立学习智能体的互动;然而,它们往往忽视了在实际的智能体社会中可能存在的道德异质性。例如,单个学习智能体在不同时间点可能面对结果主义(即专注于随时间推移最大化结果)、基于规范(即遵从特定规范)或基于美德(即考虑多种美德的组合)的对手。智能体的协同发展在多大程度上可能受到群体中此类道德异质性的影响,目前尚不明确。在本文中,我们研究了在社会困境设定中互动的道德异质群体的学习动态。通过采用带有伙伴选择机制的迭代囚徒困境环境,我们探究了群体中多样化道德智能体的普遍程度如何影响个体智能体的学习行为及涌现的群体层面结果。我们观察到亲社会与反社会智能体之间存在多种类型的非平凡互动,并发现某些类型的道德智能体能够引导自私智能体转向更具合作性的行为。