Support Testing in the Huge Object Model

The Huge Object model is a distribution testing model in which we are given access to independent samples from an unknown distribution over the set of strings $\{0,1\}^n$, but are only allowed to query a few bits from the samples. We investigate the problem of testing whether a distribution is supported on $m$ elements in this model. It turns out that the behavior of this property is surprisingly intricate, especially when also considering the question of adaptivity. We prove lower and upper bounds for both adaptive and non-adaptive algorithms in the one-sided and two-sided error regime. Our bounds are tight when $m$ is fixed to a constant (and the distance parameter $\varepsilon$ is the only variable). For the general case, our bounds are at most $O(\log m)$ apart. In particular, our results show a surprising $O(\log \varepsilon^{-1})$ gap between the number of queries required for non-adaptive testing as compared to adaptive testing. For one sided error testing, we also show that a $O(\log m)$ gap between the number of samples and the number of queries is necessary. Our results utilize a wide variety of combinatorial and probabilistic methods.

翻译：巨大对象模型是一种分布测试模型，在该模型中我们能够获取来自未知分布（定义在字符串集合 $\{0,1\}^n$ 上）的独立样本，但仅被允许查询每个样本中的少数比特位。我们研究了在该模型中测试一个分布是否支撑在 $m$ 个元素上的问题。结果表明，该属性的行为异常复杂，尤其是在同时考虑适应性（adaptivity）问题时。我们针对单侧误差和双侧误差情形下的自适应与非自适应算法，分别证明了上下界。当 $m$ 固定为常数（且距离参数 $\varepsilon$ 是唯一变量）时，我们的上下界是紧的。对于一般情况，我们的上下界最多相差 $O(\log m)$。特别地，我们的结果揭示了一个令人惊讶的现象：非自适应测试与自适应测试所需的查询次数之间存在 $O(\log \varepsilon^{-1})$ 的差距。对于单侧误差测试，我们还证明了样本数量与查询次数之间存在 $O(\log m)$ 的差距是必要的。我们的结果运用了多种组合与概率方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日