The Huge Object model is a distribution testing model in which we are given access to independent samples from an unknown distribution over the set of strings $\{0,1\}^n$, but are only allowed to query a few bits from the samples. We investigate the problem of testing whether a distribution is supported on $m$ elements in this model. It turns out that the behavior of this property is surprisingly intricate, especially when also considering the question of adaptivity. We prove lower and upper bounds for both adaptive and non-adaptive algorithms in the one-sided and two-sided error regime. Our bounds are tight when $m$ is fixed to a constant (and the distance parameter $\varepsilon$ is the only variable). For the general case, our bounds are at most $O(\log m)$ apart. In particular, our results show a surprising $O(\log \varepsilon^{-1})$ gap between the number of queries required for non-adaptive testing as compared to adaptive testing. For one sided error testing, we also show that a $O(\log m)$ gap between the number of samples and the number of queries is necessary. Our results utilize a wide variety of combinatorial and probabilistic methods.
翻译:巨大对象模型是一种分布测试模型,在该模型中我们能够获取来自未知分布(定义在字符串集合 $\{0,1\}^n$ 上)的独立样本,但仅被允许查询每个样本中的少数比特位。我们研究了在该模型中测试一个分布是否支撑在 $m$ 个元素上的问题。结果表明,该属性的行为异常复杂,尤其是在同时考虑适应性(adaptivity)问题时。我们针对单侧误差和双侧误差情形下的自适应与非自适应算法,分别证明了上下界。当 $m$ 固定为常数(且距离参数 $\varepsilon$ 是唯一变量)时,我们的上下界是紧的。对于一般情况,我们的上下界最多相差 $O(\log m)$。特别地,我们的结果揭示了一个令人惊讶的现象:非自适应测试与自适应测试所需的查询次数之间存在 $O(\log \varepsilon^{-1})$ 的差距。对于单侧误差测试,我们还证明了样本数量与查询次数之间存在 $O(\log m)$ 的差距是必要的。我们的结果运用了多种组合与概率方法。