Deep Multi-Agent Reinforcement Learning for Decentralized Active Hypothesis Testing

We consider a decentralized formulation of the active hypothesis testing (AHT) problem, where multiple agents gather noisy observations from the environment with the purpose of identifying the correct hypothesis. At each time step, agents have the option to select a sampling action. These different actions result in observations drawn from various distributions, each associated with a specific hypothesis. The agents collaborate to accomplish the task, where message exchanges between agents are allowed over a rate-limited communications channel. The objective is to devise a multi-agent policy that minimizes the Bayes risk. This risk comprises both the cost of sampling and the joint terminal cost incurred by the agents upon making a hypothesis declaration. Deriving optimal structured policies for AHT problems is generally mathematically intractable, even in the context of a single agent. As a result, recent efforts have turned to deep learning methodologies to address these problems, which have exhibited significant success in single-agent learning scenarios. In this paper, we tackle the multi-agent AHT formulation by introducing a novel algorithm rooted in the framework of deep multi-agent reinforcement learning. This algorithm, named Multi-Agent Reinforcement Learning for AHT (MARLA), operates at each time step by having each agent map its state to an action (sampling rule or stopping rule) using a trained deep neural network with the goal of minimizing the Bayes risk. We present a comprehensive set of experimental results that effectively showcase the agents' ability to learn collaborative strategies and enhance performance using MARLA. Furthermore, we demonstrate the superiority of MARLA over single-agent learning approaches. Finally, we provide an open-source implementation of the MARLA framework, for the benefit of researchers and developers in related domains.

翻译：本文研究了主动假设检验问题的分散式形式，其中多个智能体从环境中收集带有噪声的观测数据以识别正确假设。在每个时间步，智能体可选择执行采样动作，不同动作会产生服从不同分布的观测数据，这些分布分别与特定假设相关联。智能体通过速率受限的通信信道交换信息以协同完成任务。目标是设计一种最小化贝叶斯风险的多智能体策略，该风险包括采样成本以及智能体在声明假设时产生的联合终端成本。即使在单智能体场景下，为主动假设检验问题推导最优结构化策略通常也会面临数学上的不可解性。因此，近期研究转向利用深度学习方法来解决此类问题，并在单智能体学习场景中取得了显著成功。本文针对多智能体主动假设检验问题，提出了一种基于深度多智能体强化学习框架的新算法。该算法名为主动假设检验的多智能体强化学习算法，在每个时间步通过训练好的深度神经网络将每个智能体的状态映射为动作（采样规则或停止规则），以最小化贝叶斯风险为目标。我们通过全面的实验结果展示了智能体使用该算法学习协作策略并提升性能的能力。此外，我们证明了该算法相较于单智能体学习方法的优越性。最后，我们为相关领域的研究人员和开发者提供了该算法的开源实现。