Efficient exploration is critical for multiagent systems to discover coordinated strategies, particularly in open-ended domains such as search and rescue or planetary surveying. However, when exploration is encouraged only at the individual agent level, it often leads to redundancy, as agents act without awareness of how their teammates are exploring. In this work, we introduce Counterfactual Conditional Likelihood (CCL) rewards, which score each agent's exploration by isolating its unique contribution to team exploration. Unlike prior methods that reward agents solely for the novelty of their individual observations, CCL emphasizes observations that are informative with respect to the joint exploration of the team. Experiments in continuous multiagent domains show that CCL rewards accelerate learning for domains with sparse team rewards, where most joint actions yield zero rewards, and are particularly effective in tasks that require tight coordination among agents.
翻译:高效探索对于多智能体系统发现协同策略至关重要,尤其在搜索救援或行星勘测等开放领域。然而,当探索激励仅作用于单个智能体层面时,常导致冗余行为,因为智能体在行动时未能感知队友的探索状态。本研究提出反事实条件似然(CCL)奖励机制,通过量化每个智能体对团队探索的独特贡献来评估其探索行为。与先前仅基于个体观测新颖性给予奖励的方法不同,CCL强调那些对团队联合探索具有信息价值的观测。在连续多智能体领域的实验表明,CCL奖励能显著加速稀疏团队奖励场景下的学习进程——这类场景中多数联合动作产生的奖励为零,且在需要智能体紧密协同的任务中表现尤为突出。