私有可实现到不可知转换的近乎最优样本复杂度 (Private Realizable-to-Agnostic Transformation with Near-Optimal Sample Complexity)

The realizable-to-agnostic transformation (Beimel et al., 2015; Alon et al., 2020) provides a general mechanism to convert a private learner in the realizable setting (where the examples are labeled by some function in the concept class) to a private learner in the agnostic setting (where no assumptions are imposed on the data). Specifically, for any concept class $\mathcal{C}$ and error parameter $\alpha$, a private realizable learner for $\mathcal{C}$ can be transformed into a private agnostic learner while only increasing the sample complexity by $\widetilde{O}(\mathrm{VC}(\mathcal{C})/\alpha^2)$, which is essentially tight assuming a constant privacy parameter $\varepsilon = \Theta(1)$. However, when $\varepsilon$ can be arbitrary, one has to apply the standard privacy-amplification-by-subsampling technique (Kasiviswanathan et al., 2011), resulting in a suboptimal extra sample complexity of $\widetilde{O}(\mathrm{VC}(\mathcal{C})/\alpha^2\varepsilon)$ that involves a $1/\varepsilon$ factor. In this work, we give an improved construction that eliminates the dependence on $\varepsilon$, thereby achieving a near-optimal extra sample complexity of $\widetilde{O}(\mathrm{VC}(\mathcal{C})/\alpha^2)$ for any $\varepsilon\le 1$. Moreover, our result reveals that in private agnostic learning, the privacy cost is only significant for the realizable part. We also leverage our technique to obtain a nearly tight sample complexity bound for the private prediction problem, resolving an open question posed by Dwork and Feldman (2018) and Dagan and Feldman (2020).

翻译：可实现到不可知的转换机制（Beimel等人，2015；Alon等人，2020）提供了一种通用方法，可将可实现设定下的私有学习器（其中样本由概念类中的某个函数标记）转换为不可知设定下的私有学习器（其中不对数据施加任何假设）。具体而言，对于任意概念类 $\mathcal{C}$ 和误差参数 $\alpha$，可将 $\mathcal{C}$ 的私有可实现学习器转换为私有不可知学习器，同时仅将样本复杂度增加 $\widetilde{O}(\mathrm{VC}(\mathcal{C})/\alpha^2)$。在隐私参数 $\varepsilon = \Theta(1)$ 的假设下，该结果本质上是紧致的。然而，当 $\varepsilon$ 可取任意值时，必须应用标准的子采样隐私放大技术（Kasiviswanathan等人，2011），导致产生次优的额外样本复杂度 $\widetilde{O}(\mathrm{VC}(\mathcal{C})/\alpha^2\varepsilon)$，其中包含 $1/\varepsilon$ 因子。本工作提出了一种改进的构造方法，消除了对 $\varepsilon$ 的依赖，从而对任意 $\varepsilon\le 1$ 实现了近乎最优的额外样本复杂度 $\widetilde{O}(\mathrm{VC}(\mathcal{C})/\alpha^2)$。此外，我们的结果表明在私有不可知学习中，隐私代价仅对可实现部分具有显著影响。我们还利用该技术获得了私有预测问题的近乎紧致的样本复杂度界，解决了Dwork与Feldman（2018）以及Dagan与Feldman（2020）提出的开放性问题。