序列异常假设检验在普适性约束下的研究 (Sequential Outlier Hypothesis Testing under Universality Constraints)

from arxiv, v2 was published in ITW 2024, v3 is the full version with results for both cases of known and unknown number of outliers, and v4 presents the results for the known number of outliers

We revisit sequential outlier hypothesis testing and derive bounds on achievable exponents when both the nominal and anomalous distributions are \emph{unknown}. The task of outlier hypothesis testing is to identify the set of outliers that are generated from an anomalous distribution among all observed sequences where the rest majority are generated from a nominal distribution. In the sequential setting, one obtains a sample from each sequence per unit time until a reliable decision could be made. For the case with exactly one outlier, our exponent bounds on are tight, providing exact large deviations characterization of sequential tests and strengthening a previous result of Li, Nitinawarat and Veeravalli (2017). In particular, the average sample size of our sequential test is bounded universally under any pair of nominal and anomalous distributions and our sequential test achieves larger Bayesian exponent than the fixed-length test, which could not be guaranteed by the sequential test of Li, Nitinawarat and Veeravalli (2017). For the case with at most one outlier, we propose a threshold-based test that has bounded expected stopping time under mild conditions and we bound the error exponents under each non-null and the null hypotheses. Our sequential test resolves the error exponents tradeoff for the fixed-length test of Zhou, Wei and Hero (TIT 2022). Finally, with a further step towards practical applications, we generalize our results to the cases of multiple outliers and show that there is a penalty in the error exponents when the number of outliers is unknown.

翻译：我们重新审视序列异常假设检验问题，并在名义分布与异常分布均未知的情况下推导了可达到的指数界。异常假设检验的任务是从所有观测序列中识别出由异常分布生成的异常子集，而其余大多数序列由名义分布生成。在序列检验框架中，每单位时间从各序列获取一个样本，直至能够做出可靠决策。针对恰好存在一个异常序列的情形，我们获得的指数界是紧致的，这为序列检验提供了精确的大偏差特征描述，并强化了Li、Nitinawarat和Veeravalli（2017）的先前结果。特别地，我们的序列检验方案在任意名义分布与异常分布对下均具有普适有界的平均样本量，且其贝叶斯指数优于固定长度检验方案——这一性质在Li等人的序列检验中未能得到保证。针对至多存在一个异常序列的情形，我们提出了一种基于阈值的检验方案，该方案在温和条件下具有有界期望停止时间，并给出了各备择假设与零假设下的错误指数界。我们的序列检验方案解决了Zhou、Wei和Hero（TIT 2022）固定长度检验中的错误指数权衡问题。最后，为向实际应用推进，我们将结果推广至多异常序列情形，证明当异常数量未知时，错误指数将存在惩罚项。