We study the computational relationship between replicability (Impagliazzo et al. [STOC `22], Ghazi et al. [NeurIPS `21]) and other stability notions. Specifically, we focus on replicable PAC learning and its connections to differential privacy (Dwork et al. [TCC 2006]) and to the statistical query (SQ) model (Kearns [JACM `98]). Statistically, it was known that differentially private learning and replicable learning are equivalent and strictly more powerful than SQ-learning. Yet, computationally, all previously known efficient (i.e., polynomial-time) replicable learning algorithms were confined to SQ-learnable tasks or restricted distributions, in contrast to differentially private learning. Our main contribution is the first computationally efficient replicable algorithm for realizable learning of parities over arbitrary distributions, a task that is known to be hard in the SQ-model, but possible under differential privacy. This result provides the first evidence that efficient replicable learning over general distributions strictly extends efficient SQ-learning, and is closer in power to efficient differentially private learning, despite computational separations between replicability and privacy. Our main building block is a new, efficient, and replicable algorithm that, given a set of vectors, outputs a subspace of their linear span that covers most of them.
翻译:本研究探讨可复现性(Impagliazzo等人[STOC `22],Ghazi等人[NeurIPS `21])与其他稳定性概念之间的计算关系。具体而言,我们聚焦于可复现PAC学习及其与差分隐私(Dwork等人[TCC 2006])和统计查询模型(Kearns[JACM `98])的关联。在统计层面,已知差分隐私学习与可复现学习具有等价性,且严格强于统计查询学习。然而在计算层面,所有已知高效(即多项式时间)可复现学习算法均局限于统计查询可学习任务或受限分布,这与差分隐私学习形成鲜明对比。我们的核心贡献是首次提出针对任意分布上可实现奇偶性学习的计算高效可复现算法——该任务在统计查询模型中被证实是困难的,但在差分隐私框架下可实现。该结果表明:在一般分布上的高效可复现学习首次被证明严格扩展了高效统计查询学习的能力范围,并在计算能力上更接近高效差分隐私学习,尽管可复现性与隐私性之间存在计算分离。我们的核心构建模块是一种新颖、高效且可复现的算法,该算法在给定向量集合时,能输出覆盖其中大多数向量的线性张成子空间。