Before taking actions in an environment with more than one intelligent agent, an autonomous agent may benefit from reasoning about the other agents and utilizing a notion of a guarantee or confidence about the behavior of the system. In this article, we propose a novel multi-agent reinforcement learning (MARL) algorithm CAMMARL, which involves modeling the actions of other agents in different situations in the form of confident sets, i.e., sets containing their true actions with a high probability. We then use these estimates to inform an agent's decision-making. For estimating such sets, we use the concept of conformal predictions, by means of which, we not only obtain an estimate of the most probable outcome but get to quantify the operable uncertainty as well. For instance, we can predict a set that provably covers the true predictions with high probabilities (e.g., 95%). Through several experiments in two fully cooperative multi-agent tasks, we show that CAMMARL elevates the capabilities of an autonomous agent in MARL by modeling conformal prediction sets over the behavior of other agents in the environment and utilizing such estimates to enhance its policy learning.
翻译:在与多个智能体共存的环境中采取行动之前,自主智能体可通过推理其他智能体的行为并利用系统行为的保证或置信度概念获益。本文提出一种新型多智能体强化学习算法CAMMARL,该算法通过置信集(即以高概率包含其他智能体真实动作的集合)形式对不同情境下的智能体动作进行建模,并利用这些估计辅助智能体决策。为估计此类置信集,我们采用保形预测概念——不仅获取最可能结果的估计,还能量化可操作的不确定性。例如,可预测出一个能够以高概率(如95%)可证明覆盖真实预测结果的集合。通过在两项完全合作型多智能体任务上的实验表明,CAMMARL通过建模环境中其他智能体行为的保形预测集并利用此类估计增强策略学习,显著提升了自主智能体在多智能体强化学习中的能力。