We study hypothesis testing under communication constraints, where each sample is quantized before being revealed to a statistician. Without communication constraints, it is well known that the sample complexity of simple binary hypothesis testing is characterized by the Hellinger distance between the distributions. We show that the sample complexity of simple binary hypothesis testing under communication constraints is at most a logarithmic factor larger than in the unconstrained setting and this bound is tight. We develop a polynomial-time algorithm that achieves the aforementioned sample complexity. Our framework extends to robust hypothesis testing, where the distributions are corrupted in the total variation distance. Our proofs rely on a new reverse data processing inequality and a reverse Markov inequality, which may be of independent interest. For simple $M$-ary hypothesis testing, the sample complexity in the absence of communication constraints has a logarithmic dependence on $M$. We show that communication constraints can cause an exponential blow-up leading to $\Omega(M)$ sample complexity even for adaptive algorithms.
翻译:我们研究了通信约束下的假设检验问题,其中每个样本在被统计学家揭示之前需要经过量化处理。无通信约束时,简单二元假设检验的样本复杂度由分布间的Hellinger距离决定是众所周知的。我们证明,在通信约束下,简单二元假设检验的样本复杂度至多比无约束情形大一个对数因子,且该界是紧的。我们提出了一个多项式时间算法来实现上述样本复杂度。我们的框架可扩展至鲁棒假设检验,其中分布受到全变分距离的污染。证明过程依赖于新的反向数据处理不等式和反向马尔可夫不等式,这些结果可能具有独立的理论价值。对于简单的$M$元假设检验,无通信约束时的样本复杂度对$M$呈对数依赖关系。我们表明,通信约束可能导致指数级增长,即使对于自适应算法,样本复杂度也会达到$\Omega(M)$。