The sample complexity of simple binary hypothesis testing is the smallest number of i.i.d. samples required to distinguish between two distributions $p$ and $q$ in either: (i) the prior-free setting, with type-I error at most $\alpha$ and type-II error at most $\beta$; or (ii) the Bayesian setting, with Bayes error at most $\delta$ and prior distribution $(\alpha, 1-\alpha)$. This problem has only been studied when $\alpha = \beta$ (prior-free) or $\alpha = 1/2$ (Bayesian), and the sample complexity is known to be characterized by the Hellinger divergence between $p$ and $q$, up to multiplicative constants. In this paper, we derive a formula that characterizes the sample complexity (up to multiplicative constants that are independent of $p$, $q$, and all error parameters) for: (i) all $0 \le \alpha, \beta \le 1/8$ in the prior-free setting; and (ii) all $\delta \le \alpha/4$ in the Bayesian setting. In particular, the formula admits equivalent expressions in terms of certain divergences from the Jensen--Shannon and Hellinger families. The main technical result concerns an $f$-divergence inequality between members of the Jensen--Shannon and Hellinger families, which is proved by a combination of information-theoretic tools and case-by-case analyses. We explore applications of our results to robust and distributed (locally-private and communication-constrained) hypothesis testing.
翻译:简单二元假设检验的样本复杂度是指在以下两种情境下区分两个分布 $p$ 和 $q$ 所需的最小独立同分布样本数量:(i)无先验设定下,第一类错误不超过 $\alpha$ 且第二类错误不超过 $\beta$;或(ii)贝叶斯设定下,贝叶斯错误不超过 $\delta$ 且先验分布为 $(\alpha, 1-\alpha)$。该问题仅在 $\alpha = \beta$(无先验)或 $\alpha = 1/2$(贝叶斯)情形下被研究,此时样本复杂度由 $p$ 和 $q$ 之间的Hellinger散度在乘法常数意义下刻画。本文推导出一个公式,可在乘法常数(与 $p$、$q$ 及所有误差参数无关)意义下描述样本复杂度,适用于:(i)无先验设定中所有 $0 \le \alpha, \beta \le 1/8$ 的情形;以及(ii)贝叶斯设定中所有 $\delta \le \alpha/4$ 的情形。特别地,该公式可等价表达为Jensen–Shannon族与Hellinger族中若干散度的形式。主要技术结果涉及Jensen–Shannon族与Hellinger族成员间的$f$-散度不等式,证明结合了信息论工具与分类讨论。我们进一步探讨了该结果在鲁棒假设检验以及分布式(本地差分隐私与通信受限)假设检验中的应用。