We address the problem of monitoring a set of binary stochastic processes and generating an alert when the number of anomalies among them exceeds a threshold. For this, the decision-maker selects and probes a subset of the processes to obtain noisy estimates of their states (normal or anomalous). Based on the received observations, the decisionmaker first determines whether to declare that the number of anomalies has exceeded the threshold or to continue taking observations. When the decision is to continue, it then decides whether to collect observations at the next time instant or defer it to a later time. If it chooses to collect observations, it further determines the subset of processes to be probed. To devise this three-step sequential decision-making process, we use a Bayesian formulation wherein we learn the posterior probability on the states of the processes. Using the posterior probability, we construct a Markov decision process and solve it using deep actor-critic reinforcement learning. Via numerical experiments, we demonstrate the superior performance of our algorithm compared to the traditional model-based algorithms.
翻译:我们研究了监测一组二元随机过程并在异常数量超过阈值时生成警报的问题。为此,决策者选择并探测这些过程的一个子集,以获取其状态(正常或异常)的带噪声估计值。基于接收到的观测值,决策者首先判断是宣布异常数量已超过阈值,还是继续采集观测值。若决定继续,则进一步选择是在下一时刻立即采集观测值,还是推迟到后续时刻。若选择采集观测值,还需确定探测过程的具体子集。为制定这一三步序贯决策流程,我们采用贝叶斯方法学习过程状态的后验概率。基于后验概率构建马尔可夫决策过程,并利用深度演员-评论家强化学习进行求解。通过数值实验,我们证明了所提算法相比传统基于模型的算法具有更优性能。