The realizable-to-agnostic transformation (Beimel et al., 2015; Alon et al., 2020) provides a general mechanism to convert a private learner in the realizable setting (where the examples are labeled by some function in the concept class) to a private learner in the agnostic setting (where no assumptions are imposed on the data). Specifically, for any concept class $\mathcal{C}$ and error parameter $\alpha$, a private realizable learner for $\mathcal{C}$ can be transformed into a private agnostic learner while only increasing the sample complexity by $\widetilde{O}(\mathrm{VC}(\mathcal{C})/\alpha^2)$, which is essentially tight assuming a constant privacy parameter $\varepsilon = \Theta(1)$. However, when $\varepsilon$ can be arbitrary, one has to apply the standard privacy-amplification-by-subsampling technique (Kasiviswanathan et al., 2011), resulting in a suboptimal extra sample complexity of $\widetilde{O}(\mathrm{VC}(\mathcal{C})/\alpha^2\varepsilon)$ that involves a $1/\varepsilon$ factor. In this work, we give an improved construction that eliminates the dependence on $\varepsilon$, thereby achieving a near-optimal extra sample complexity of $\widetilde{O}(\mathrm{VC}(\mathcal{C})/\alpha^2)$ for any $\varepsilon\le 1$. Moreover, our result reveals that in private agnostic learning, the privacy cost is only significant for the realizable part. We also leverage our technique to obtain a nearly tight sample complexity bound for the private prediction problem, resolving an open question posed by Dwork and Feldman (2018) and Dagan and Feldman (2020).
翻译:可实现到不可知的转换机制(Beimel等人,2015;Alon等人,2020)提供了一种通用方法,可将可实现设定下的私有学习器(其中样本由概念类中的某个函数标记)转换为不可知设定下的私有学习器(其中不对数据施加任何假设)。具体而言,对于任意概念类 $\mathcal{C}$ 和误差参数 $\alpha$,可将 $\mathcal{C}$ 的私有可实现学习器转换为私有不可知学习器,同时仅将样本复杂度增加 $\widetilde{O}(\mathrm{VC}(\mathcal{C})/\alpha^2)$。在隐私参数 $\varepsilon = \Theta(1)$ 的假设下,该结果本质上是紧致的。然而,当 $\varepsilon$ 可取任意值时,必须应用标准的子采样隐私放大技术(Kasiviswanathan等人,2011),导致产生次优的额外样本复杂度 $\widetilde{O}(\mathrm{VC}(\mathcal{C})/\alpha^2\varepsilon)$,其中包含 $1/\varepsilon$ 因子。本工作提出了一种改进的构造方法,消除了对 $\varepsilon$ 的依赖,从而对任意 $\varepsilon\le 1$ 实现了近乎最优的额外样本复杂度 $\widetilde{O}(\mathrm{VC}(\mathcal{C})/\alpha^2)$。此外,我们的结果表明在私有不可知学习中,隐私代价仅对可实现部分具有显著影响。我们还利用该技术获得了私有预测问题的近乎紧致的样本复杂度界,解决了Dwork与Feldman(2018)以及Dagan与Feldman(2020)提出的开放性问题。