Sequential anomaly identification with observation control under generalized error metrics

The problem of sequential anomaly detection and identification is considered, where multiple data sources are simultaneously monitored and the goal is to identify in real time those, if any, that exhibit ``anomalous" statistical behavior. An upper bound is postulated on the number of data sources that can be sampled at each sampling instant, but the decision maker selects which ones to sample based on the already collected data. Thus, in this context, a policy consists not only of a stopping rule and a decision rule that determine when sampling should be terminated and which sources to identify as anomalous upon stopping, but also of a sampling rule that determines which sources to sample at each time instant subject to the sampling constraint. Two distinct formulations are considered, which require control of different, ``generalized" error metrics. The first one tolerates a certain user-specified number of errors, of any kind, whereas the second tolerates distinct, user-specified numbers of false positives and false negatives. For each of them, a universal asymptotic lower bound on the expected time for stopping is established as the error probabilities go to 0, and it is shown to be attained by a policy that combines the stopping and decision rules proposed in the full-sampling case with a probabilistic sampling rule that achieves a specific long-run sampling frequency for each source. Moreover, the optimal to a first order asymptotic approximation expected time for stopping is compared in simulation studies with the corresponding factor in a finite regime, and the impact of the sampling constraint and tolerance to errors is assessed.

翻译：本文研究序贯异常检测与识别问题，其中多个数据源被同时监测，目标在于实时识别出那些（若存在）表现出“异常”统计行为的数据源。假设每个采样时刻可采样的数据源数量存在上限，但决策者需根据已收集数据动态选择待采样的数据源。因此，该场景下的策略不仅包含决定何时终止采样及停止时判定哪些源为异常的停止规则与决策规则，还包含在采样约束下确定每个时刻应采样哪些源的采样规则。本文考虑两种不同的建模框架，分别要求控制不同类型的“广义”误差度量：第一种允许容忍用户指定的任意类型错误数量，第二种则分别容忍用户指定的假阳性与假阴性错误数量。针对每种框架，本文建立了当错误概率趋于零时期望停止时间的通用渐近下界，并证明该下界可通过特定策略达到——该策略将全采样情形下提出的停止与决策规则，与能为每个数据源实现特定长期采样频率的概率采样规则相结合。此外，通过仿真研究将一阶渐近最优的期望停止时间与有限样本机制下的对应因子进行比较，评估了采样约束与误差容忍度对系统性能的影响。