Innate values describe agents' intrinsic motivations, which reflect their inherent interests and preferences to pursue goals and drive them to develop diverse skills satisfying their various needs. The essence of reinforcement learning (RL) is learning from interaction based on reward-driven (such as utilities) behaviors, much like natural agents. It is an excellent model to describe the innate-values-driven (IV) behaviors of AI agents. Especially in multi-agent systems (MAS), building the awareness of AI agents to balance the group utilities and system costs and satisfy group members' needs in their cooperation is a crucial problem for individuals learning to support their community and integrate human society in the long term. This paper proposes a hierarchical compound intrinsic value reinforcement learning model -- innate-values-driven reinforcement learning termed IVRL to describe the complex behaviors of multi-agent interaction in their cooperation. We implement the IVRL architecture in the StarCraft Multi-Agent Challenge (SMAC) environment and compare the cooperative performance within three characteristics of innate value agents (Coward, Neutral, and Reckless) through three benchmark multi-agent RL algorithms: QMIX, IQL, and QTRAN. The results demonstrate that by organizing individual various needs rationally, the group can achieve better performance with lower costs effectively.
翻译:内驱价值描述了智能体的内在动机,反映了其在追求目标时的固有兴趣与偏好,并驱使其发展出满足不同需求的多样化技能。强化学习的本质是通过基于奖励(如效用)驱动行为的交互式学习,这与自然智能体的学习方式高度相似。该模型是描述人工智能智能体内驱价值驱动行为的绝佳范式。尤其在多智能体系统中,构建智能体的认知能力以平衡群体效用与系统成本、在协作中满足群体成员需求,是帮助个体长期学习适应社区并融入人类社会的关键问题。本文提出一种分层复合型内在价值强化学习模型——内驱价值驱动强化学习(IVRL),用以描述多智能体在协作中的复杂交互行为。我们在星际争霸多智能体挑战(SMAC)环境中实现了IVRL架构,并通过三种基准多智能体强化学习算法(QMIX、IQL、QTRAN),对比了具有三种内驱价值特征的智能体(怯懦型、中立型与鲁莽型)的协作性能。结果表明,通过合理组织个体的多样化需求,群体能以更低成本有效实现更优性能。