Distribution Testing Under the Parity Trace

Distribution testing is a fundamental statistical task with many applications, but we are interested in a variety of problems where systematic mislabelings of the sample prevent us from applying the existing theory. To apply distribution testing to these problems, we introduce distribution testing under the parity trace, where the algorithm receives an ordered sample $S$ that reveals only the least significant bit of each element. This abstraction reveals connections between the following three problems of interest, allowing new upper and lower bounds: 1. In distribution testing with a confused collector, the collector of the sample may be incapable of distinguishing between nearby elements of a domain (e.g. a machine learning classifier). We prove bounds for distribution testing with a confused collector on domains structured as a cycle or a path. 2. Recent work on the fundamental testing vs. learning question established tight lower bounds on distribution-free sample-based property testing by reduction from distribution testing, but the tightness is limited to symmetric properties. The parity trace allows a broader family of equivalences to non-symmetric properties, while recovering and strengthening many of the previous results with a different technique. 3. We give the first results for property testing in the well-studied trace reconstruction model, where the goal is to test whether an unknown string $x$ satisfies some property or is far from satisfying that property, given only independent random traces of $x$. Our main technical result is a tight bound of $\widetilde \Theta\left((n/\epsilon)^{4/5} + \sqrt n/\epsilon^2\right)$ for testing uniformity of distributions over $[n]$ under the parity trace, leading also to results for the problems above.

翻译：分布测试是一项基础统计任务，具有诸多应用，但我们在多种问题中面临系统性的样本误标记，导致现有理论无法直接应用。为在这些问题中应用分布测试，我们引入奇偶迹下的分布测试：算法接收有序样本 $S$，该样本仅揭示每个元素的最低有效位。这一抽象思想揭示了以下三个感兴趣问题之间的内在联系，并允许推导出新的上下界：1. 在混淆收集者的分布测试中，样本收集者可能无法区分定义域中相近元素（例如机器学习分类器）。我们证明了在结构为循环或路径的定义域上，混淆收集者下的分布测试界限。2. 近期关于测试与学习基本问题的研究，通过从分布测试进行归约，建立了无分布样本基属性测试的紧致下界，但这种紧致性仅限于对称属性。奇偶迹允许将等价性推广至更广泛的非对称属性族，同时以不同技术恢复并强化了此前多项结果。3. 我们在经典迹重建模型中首次给出属性测试结果，该模型的目标是：在仅知未知字符串 $x$ 的独立随机迹的情况下，测试 $x$ 是否满足某属性或远离该属性的满足条件。我们的主要技术成果是奇偶迹下定义域 $[n]$ 上均匀性测试的紧致界限：$\widetilde \Theta\left((n/\epsilon)^{4/5} + \sqrt n/\epsilon^2\right)$，该结果同时推动了上述问题的研究进展。