Stability is Stable: Connections between Replicability, Privacy, and Adaptive Generalization

The notion of replicable algorithms was introduced in Impagliazzo et al. [STOC '22] to describe randomized algorithms that are stable under the resampling of their inputs. More precisely, a replicable algorithm gives the same output with high probability when its randomness is fixed and it is run on a new i.i.d. sample drawn from the same distribution. Using replicable algorithms for data analysis can facilitate the verification of published results by ensuring that the results of an analysis will be the same with high probability, even when that analysis is performed on a new data set. In this work, we establish new connections and separations between replicability and standard notions of algorithmic stability. In particular, we give sample-efficient algorithmic reductions between perfect generalization, approximate differential privacy, and replicability for a broad class of statistical problems. Conversely, we show any such equivalence must break down computationally: there exist statistical problems that are easy under differential privacy, but that cannot be solved replicably without breaking public-key cryptography. Furthermore, these results are tight: our reductions are statistically optimal, and we show that any computational separation between DP and replicability must imply the existence of one-way functions. Our statistical reductions give a new algorithmic framework for translating between notions of stability, which we instantiate to answer several open questions in replicability and privacy. This includes giving sample-efficient replicable algorithms for various PAC learning, distribution estimation, and distribution testing problems, algorithmic amplification of $\delta$ in approximate DP, conversions from item-level to user-level privacy, and the existence of private agnostic-to-realizable learning reductions under structured distributions.

翻译：可复制算法（replicable algorithms）的概念由Impagliazzo等人引入[STOC '22]，用于描述在输入重采样下保持稳定的随机化算法。更准确地说，当随机性固定且算法在从同一分布中独立同分布抽取的新样本上运行时，可复制算法以高概率给出相同输出。使用可复制算法进行数据分析有助于验证已发表结果，因为即使分析在新的数据集上执行，也能以高概率确保分析结果的一致性。在本工作中，我们建立了可复制性与算法稳定性标准概念之间的新联系与分离。特别地，针对广泛的统计问题类别，我们给出了在完美泛化、近似差分隐私与可复制性之间的样本高效算法约简。反过来，我们证明任何此类等价性必然在计算上失效：存在在差分隐私下容易解决、但若不破解公钥密码学就无法以可复制方式求解的统计问题。此外，这些结果是紧致的：我们的约简在统计意义上是最优的，并且我们证明，任何DP与可复制性之间的计算分离必然蕴含单向函数的存在。我们的统计约简提供了一种在稳定性概念之间进行转换的新算法框架，并以此解答了可复制性与隐私领域的若干开放问题。这包括：针对各种PAC学习、分布估计与分布测试问题的样本高效可复制算法，近似DP中δ的算法放大，从条目级到用户级隐私的转换，以及在结构化分布下私有从不可知到可实现学习的约简存在性。