We propose a model for multi-objective optimization, a credo, for agents in a system that are configured into multiple groups (i.e., teams). Our model of credo regulates how agents optimize their behavior for the groups they belong to. We evaluate credo in the context of challenging social dilemmas with reinforcement learning agents. Our results indicate that the interests of teammates, or the entire system, are not required to be fully aligned for achieving globally beneficial outcomes. We identify two scenarios without full common interest that achieve high equality and significantly higher mean population rewards compared to when the interests of all agents are aligned.
翻译:我们提出了一种用于多目标优化的模型——信条,适用于系统中配置为多个群体(即团队)的智能体。我们的信条模型规范了智能体如何为其所属群体优化自身行为。我们在具有挑战性的社会困境背景下,结合强化学习智能体对信条进行了评估。结果表明,为实现全局有益结果,团队成员或整个系统的利益无需完全一致。我们识别出两种缺乏完全共同利益的情景,这些情景实现了高度平等,并且平均群体奖励显著高于所有智能体利益一致的情况。