We study the computational relationship between replicability (Impagliazzo et al. [STOC `22], Ghazi et al. [NeurIPS `21]) and other stability notions. Specifically, we focus on replicable PAC learning and its connections to differential privacy (Dwork et al. [TCC 2006]) and to the statistical query (SQ) model (Kearns [JACM `98]). Statistically, it was known that differentially private learning and replicable learning are equivalent and strictly more powerful than SQ-learning. Yet, computationally, all previously known efficient (i.e., polynomial-time) replicable learning algorithms were confined to SQ-learnable tasks or restricted distributions, in contrast to differentially private learning. Our main contribution is the first computationally efficient replicable algorithm for realizable learning of parities over arbitrary distributions, a task that is known to be hard in the SQ-model, but possible under differential privacy. This result provides the first evidence that efficient replicable learning over general distributions strictly extends efficient SQ-learning, and is closer in power to efficient differentially private learning, despite computational separations between replicability and privacy. Additionally, we leverage our parity learner to prove that, assuming $RP \neq NP$, converting replicability to pure differential privacy requires a strict loss in sample complexity. Our main building block is a new, efficient, and replicable algorithm that, given a set of vectors, outputs a subspace of their linear span that covers most of them.
翻译:我们研究了可复现性(Impagliazzo等人,[STOC '22];Ghazi等人,[NeurIPS '21])与其他稳定性概念之间的计算关系。具体而言,我们聚焦于可复现PAC学习及其与差分隐私(Dwork等人,[TCC 2006])和统计查询(SQ)模型(Kearns,[JACM '98])的联系。从统计角度看,已知差分隐私学习与可复现学习是等价的,并且严格强于SQ学习。然而,从计算角度看,与差分隐私学习不同,之前所有已知的高效(即多项式时间)可复现学习算法都被局限于SQ可学习任务或受限分布。我们的主要贡献是首个针对任意分布上可实现奇偶性学习的高效可复现算法,而该任务在SQ模型中被认为是困难的,但在差分隐私下却是可行的。该结果首次证明,尽管可复现性与隐私性之间存在计算分离,但在一般分布上的高效可复现学习严格扩展了高效SQ学习,并且其能力更接近高效差分隐私学习。此外,我们利用该奇偶性学习器证明,假设$RP \neq NP$,将可复现性转化为纯差分隐私需要样本复杂度的严格损失。我们的主要构建块是一个新的、高效的且可复现的算法,该算法给定一组向量,输出一个覆盖其中大部分向量的线性张成子空间。