The problem of robust hypothesis testing is studied, where under the null and the alternative hypotheses, the data-generating distributions are assumed to be in some uncertainty sets, and the goal is to design a test that performs well under the worst-case distributions over the uncertainty sets. In this paper, uncertainty sets are constructed in a data-driven manner using kernel method, i.e., they are centered around empirical distributions of training samples from the null and alternative hypotheses, respectively; and are constrained via the distance between kernel mean embeddings of distributions in the reproducing kernel Hilbert space, i.e., maximum mean discrepancy (MMD). The Bayesian setting and the Neyman-Pearson setting are investigated. For the Bayesian setting where the goal is to minimize the worst-case error probability, an optimal test is firstly obtained when the alphabet is finite. When the alphabet is infinite, a tractable approximation is proposed to quantify the worst-case average error probability, and a kernel smoothing method is further applied to design test that generalizes to unseen samples. A direct robust kernel test is also proposed and proved to be exponentially consistent. For the Neyman-Pearson setting, where the goal is to minimize the worst-case probability of miss detection subject to a constraint on the worst-case probability of false alarm, an efficient robust kernel test is proposed and is shown to be asymptotically optimal. Numerical results are provided to demonstrate the performance of the proposed robust tests.
翻译:稳健假设检验问题被研究,其中在原假设和备择假设下,数据生成分布被假定位于某些不确定集合内,目标是设计一种能在不确定集合上最坏情况下表现良好的检验。本文采用核方法以数据驱动方式构建不确定集合,即分别以原假设和备择假设训练样本的经验分布为中心,并通过再生核希尔伯特空间中分布的核均值嵌入距离(即最大均值差异,MMD)进行约束。研究了贝叶斯设定和奈曼-皮尔逊设定。对于旨在最小化最坏情况错误概率的贝叶斯设定,首先在字母表有限时得到了最优检验。当字母表无限时,提出了一种可处理的近似方法来量化最坏情况平均错误概率,并进一步应用核平滑方法设计能泛化到未见样本的检验。还提出了一种直接稳健核检验并证明其具有指数一致性。对于奈曼-皮尔逊设定(即在最坏情况虚警概率约束下最小化最坏情况漏检概率),提出了一种高效的稳健核检验并证明其渐近最优性。数值结果展示了所提出稳健检验的性能。