The Huge Object model for distribution testing, first defined by Goldreich and Ron in 2022, combines the features of classical string testing and distribution testing. In this model we are given access to independent samples from an unknown distribution $P$ over the set of strings $\{0,1\}^n$, but are only allowed to query a few bits from the samples. The distinction between adaptive and non-adaptive algorithms, which is natural in the realm of string testing (but is not relevant for classical distribution testing), plays a substantial role in the Huge Object model as well. In this work we show that in fact, the full picture in the Huge Object model is much richer than just that of the ``adaptive vs. non-adaptive'' dichotomy. We define and investigate several models of adaptivity that lie between the fully-adaptive and the completely non-adaptive extremes. These models are naturally grounded by viewing the querying process from each sample independently, and considering the ``algorithmic flow'' between them. For example, if we allow no information at all to cross over between samples (up to the final decision), then we obtain the locally bounded adaptive model, arguably the ``least adaptive'' one apart from being completely non-adaptive. A slightly stronger model allows only a ``one-way'' information flow. Even stronger (but still far from being fully adaptive) models follow by taking inspiration from the setting of streaming algorithms. To show that we indeed have a hierarchy, we prove a chain of exponential separations encompassing most of the models that we define.
翻译:庞大对象模型(Huge Object Model)用于分布测试,由Goldreich和Ron于2022年首次定义,该模型结合了经典字符串测试与分布测试的特征。在该模型中,我们能够访问从字符串集合 $\{0,1\}^n$ 上的未知分布 $P$ 中独立抽取的样本,但仅允许查询样本中的少量比特位。自适应算法与非自适应算法之间的区分,在字符串测试领域自然存在(但对经典分布测试并不适用),在庞大对象模型中也扮演着重要角色。本研究表明,在庞大对象模型中,实际图景远不止“自适应与非自适应”的二分法。我们定义并探究了几种介于完全自适应与完全非自适应极端之间的自适应模型。这些模型通过独立审视每个样本的查询过程,并考虑样本间的“算法流”而自然形成。例如,若不允许任何信息在样本之间传递(直至最终决策),则得到局部有界自适应模型,这可以说是除完全非自适应外“自适应程度最低”的模型。稍强的模型仅允许“单向”信息流。更强的模型(但远未达到完全自适应)则借鉴流式算法的设定思路。为证明我们确实建立了层次结构,我们推导出了一条涵盖所定义的大部分模型的指数级分离链。