Before taking actions in an environment with more than one intelligent agent, an autonomous agent may benefit from reasoning about the other agents and utilizing a notion of a guarantee or confidence about the behavior of the system. In this article, we propose a novel multi-agent reinforcement learning (MARL) algorithm CAMMARL, which involves modeling the actions of other agents in different situations in the form of confident sets, i.e., sets containing their true actions with a high probability. We then use these estimates to inform an agent's decision-making. For estimating such sets, we use the concept of conformal predictions, by means of which, we not only obtain an estimate of the most probable outcome but get to quantify the operable uncertainty as well. For instance, we can predict a set that provably covers the true predictions with high probabilities (e.g., 95%). Through several experiments in two fully cooperative multi-agent tasks, we show that CAMMARL elevates the capabilities of an autonomous agent in MARL by modeling conformal prediction sets over the behavior of other agents in the environment and utilizing such estimates to enhance its policy learning. All developed codes can be found here: https://github.com/Nikunj-Gupta/conformal-agent-modelling.
翻译:在包含多个智能体的环境中采取行动之前,自主智能体可能受益于对其他智能体的推理,并利用关于系统行为的保证或置信度概念。本文提出了一种新型多智能体强化学习(MARL)算法CAMMARL,该算法通过置信集形式建模其他智能体在不同情境下的行为,即这些集合以高概率包含其真实行为。随后,我们利用这些估计来指导智能体的决策制定。为估计此类集合,我们采用保形预测概念,这不仅使我们能获得最可能结果的估计,还能量化可操作的不确定性。例如,我们可以预测一个能以高概率(如95%)证明覆盖真实预测的集合。通过在两个完全协作多智能体任务中的多项实验,我们证明CAMMARL通过建模环境中其他智能体行为的保形预测集,并利用此类估计增强其策略学习,从而提升了MARL中自主智能体的能力。所有开发代码均可在此获取:https://github.com/Nikunj-Gupta/conformal-agent-modelling。