On the Sample Complexity of Robust Binary Hypothesis Testing

We study the sample complexity of robust binary hypothesis testing under three standard contamination models: $\varepsilon$-additive (Huber), $\varepsilon$-subtractive, and $\varepsilon$-total variation (TV), denoted by $n^*_{\mathrm{Hub}}(\varepsilon)$, $n^*_{\mathrm{Sub}}(\varepsilon)$, and $n^*_{\mathrm{TV}}(\varepsilon)$, respectively. For subtractive contamination, we show that least favourable distributions exist and provide explicit formulas for the same, bringing this model in line with the classical Huber and TV models. Next we show that in all three models, sample complexity may be highly unstable in the contamination parameter $\varepsilon$, increasing by polynomial factors even for $o(\varepsilon)$ perturbations. Similarly, there may be polynomial factor gaps between the sample complexities when $\varepsilon$ is known exactly versus when it is known up to $o(\varepsilon)$ error. Despite the instability of the sample complexity in all models, we show that the sample complexities across models are comparable up to constant-factor rescaling of $\varepsilon$. Specifically, for any fixed $δ_0>0$, the following hold for all distributions $p$ and $q$: (i) $n^*_{\mathrm{Hub}}(\varepsilon) \lesssim n^*_{\mathrm{TV}}(\varepsilon) \lesssim n^*_{\mathrm{Hub}}(2\varepsilon)$, (ii) $n^*_{\mathrm{Sub}}(\varepsilon) \lesssim n^*_{\mathrm{TV}}(\varepsilon) \lesssim n^*_{\mathrm{Sub}}((2+δ_0)\varepsilon)$, and (iii) $n^*_{\mathrm{Sub}}(\varepsilon) \lesssim n^*_{\mathrm{Hub}}(\varepsilon) \lesssim n^*_{\mathrm{Sub}}((1+δ_0)\varepsilon)$, and the scaling constants are tight. Finally, we extend our results to adaptive versions of the contamination models.

翻译：我们研究了在三种标准污染模型下鲁棒二元假设检验的样本复杂度：$\varepsilon$-加性（Huber）、$\varepsilon$-减性和$\varepsilon$-全变差（TV），分别记为$n^*_{\mathrm{Hub}}(\varepsilon)$、$n^*_{\mathrm{Sub}}(\varepsilon)$和$n^*_{\mathrm{TV}}(\varepsilon)$。对于减性污染，我们证明最不利分布存在并给出其显式公式，从而使该模型与经典Huber和TV模型协调一致。接下来，我们证明在所有三种模型中，样本复杂度可能在污染参数$\varepsilon$上高度不稳定，即使对于$o(\varepsilon)$的扰动也会出现多项式因子的增长。类似地，当$\varepsilon$精确已知与仅知到$o(\varepsilon)$误差时，样本复杂度之间可能存在多项式因子的差距。尽管所有模型中的样本复杂度具有不稳定性，我们证明各模型间的样本复杂度在$\varepsilon$的常系数重标度下是可比较的。具体而言，对任意固定$δ_0>0$，以下结论对所有分布$p$和$q$成立：(i) $n^*_{\mathrm{Hub}}(\varepsilon) \lesssim n^*_{\mathrm{TV}}(\varepsilon) \lesssim n^*_{\mathrm{Hub}}(2\varepsilon)$，(ii) $n^*_{\mathrm{Sub}}(\varepsilon) \lesssim n^*_{\mathrm{TV}}(\varepsilon) \lesssim n^*_{\mathrm{Sub}}((2+δ_0)\varepsilon)$，(iii) $n^*_{\mathrm{Sub}}(\varepsilon) \lesssim n^*_{\mathrm{Hub}}(\varepsilon) \lesssim n^*_{\mathrm{Sub}}((1+δ_0)\varepsilon)$，且缩放常数为紧的。最后，我们将结果推广到污染模型的自适应版本。