The problem of robust hypothesis testing is studied, where under the null and the alternative hypotheses, the data-generating distributions are assumed to be in some uncertainty sets, and the goal is to design a test that performs well under the worst-case distributions over the uncertainty sets. In this paper, uncertainty sets are constructed in a data-driven manner using kernel method, i.e., they are centered around empirical distributions of training samples from the null and alternative hypotheses, respectively; and are constrained via the distance between kernel mean embeddings of distributions in the reproducing kernel Hilbert space, i.e., maximum mean discrepancy (MMD). The Bayesian setting and the Neyman-Pearson setting are investigated. For the Bayesian setting where the goal is to minimize the worst-case error probability, an optimal test is firstly obtained when the alphabet is finite. When the alphabet is infinite, a tractable approximation is proposed to quantify the worst-case average error probability, and a kernel smoothing method is further applied to design test that generalizes to unseen samples. A direct robust kernel test is also proposed and proved to be exponentially consistent. For the Neyman-Pearson setting, where the goal is to minimize the worst-case probability of miss detection subject to a constraint on the worst-case probability of false alarm, an efficient robust kernel test is proposed and is shown to be asymptotically optimal. Numerical results are provided to demonstrate the performance of the proposed robust tests.
翻译:本文研究鲁棒假设检验问题,在零假设与备择假设下,假定数据生成分布属于某个不确定集,目标是设计一种能在不确定集上最坏情况分布下表现良好的检验方法。本文采用核方法以数据驱动方式构建不确定集,即分别以零假设和备择假设训练样本的经验分布为中心,并通过再生核希尔伯特空间中分布核均值嵌入之间的距离(即最大均值差异,MMD)进行约束。本文研究了贝叶斯设定与奈曼-皮尔逊设定。针对以最小化最坏情况错误概率为目标的贝叶斯设定,首先在字母表有限情况下获得最优检验;当字母表无限时,提出了一种可处理的近似方法以量化最坏情况平均错误概率,并进一步采用核平滑方法设计能泛化至未观测样本的检验。此外,本文直接提出了一种鲁棒核检验,并证明其具有指数一致性。针对以最小化最坏情况漏检概率(在约束最坏情况虚警概率条件下)为目标的奈曼-皮尔逊设定,本文提出了一种高效鲁棒核检验,并证明其具有渐近最优性。数值实验结果验证了所提出鲁棒检验方法的性能。