We propose an e-value based framework for testing arbitrary composite nulls against composite alternatives, when an $ε$ fraction of the data can be arbitrarily corrupted. Our tests are inherently sequential, being valid at arbitrary data-dependent stopping times, but they are new even for fixed sample sizes, giving type-I error control without any regularity conditions. We first prove that least favourable distribution (LFD) pairs, when they exist, yield optimal e-values for testing arbitrary composite nulls against composite alternatives. Then we show that if an LFD pair exists for some composite null and alternative, then the LFDs of Huber's $ε$-contamination or total variation (TV) neighborhoods around that specific pair form the optimal LFD pair for the corresponding robustified composite hypotheses. Furthermore, where LFDs do not exist, we develop new robust composite tests for general settings. Our test statistics are a nonnegative supermartingale under the (robust) null, even under a sequentially adaptive (non-i.i.d.) contamination model where the conditional distribution of each observation given the past data lies within an $ε$ TV ball of some distribution in the original composite null. When LFDs exist, our supermartingale grows to infinity exponentially fast under any distribution in the ($ε$ TV-corruption of the) alternative at the optimal rate. When LFDs do not exist, we provide an asymptotic growth rate analysis, showing that as $ε\to 0$, the exponent converges to the corresponding Kullback-Leibler divergence, recovering the classical optimal non-robust rate. Simulations validate the theory and demonstrate reasonable practical performance.
翻译:本文提出了一种基于e值的框架,用于在数据中可能存在ε比例任意污染的情况下,检验任意复合零假设与复合备择假设。我们的检验本质上是序贯的,可在任意数据依赖的停止时间保持有效性,但即便对于固定样本量也具有新颖性,能够在无需任何正则性条件的情况下控制第一类错误。我们首先证明,当存在最不利分布对时,该分布对能够为检验任意复合零假设与备择假设提供最优e值。随后我们证明,若某对复合零假设与备择假设存在最不利分布对,则围绕该特定分布对的Huber ε污染邻域或全变差邻域的最不利分布,将构成相应稳健化复合假设的最优最不利分布对。此外,在不存在最不利分布的情况下,我们为一般性场景开发了新的稳健复合检验方法。即使在序贯自适应(非独立同分布)污染模型下——其中每个观测值给定历史数据的条件分布位于原始复合零假设中某分布的ε全变差球内——我们的检验统计量在(稳健)零假设下仍构成非负上鞅。当存在最不利分布时,我们的上鞅在备择假设(经ε全变差污染)中的任意分布下,将以最优速率指数级增长至无穷。当不存在最不利分布时,我们提供了渐近增长速率分析,证明当ε→0时,指数收敛于相应的Kullback-Leibler散度,从而恢复了经典的最优非稳健速率。仿真实验验证了理论结果,并展示了合理的实际性能。