In many applications, the labeled data at the learner's disposal is subject to privacy constraints and is relatively limited. To derive a more accurate predictor for the target domain, it is often beneficial to leverage publicly available labeled data from an alternative domain, somewhat close to the target domain. This is the modern problem of supervised domain adaptation from a public source to a private target domain. We present two $(\epsilon, \delta)$-differentially private adaptation algorithms for supervised adaptation, for which we make use of a general optimization problem, recently shown to benefit from favorable theoretical learning guarantees. Our first algorithm is designed for regression with linear predictors and shown to solve a convex optimization problem. Our second algorithm is a more general solution for loss functions that may be non-convex but Lipschitz and smooth. While our main objective is a theoretical analysis, we also report the results of several experiments first demonstrating that the non-private versions of our algorithms outperform adaptation baselines and next showing that, for larger values of the target sample size or $\epsilon$, the performance of our private algorithms remains close to that of the non-private formulation.
翻译:在许多应用中,学习者可获得的标注数据受到隐私约束且数量相对有限。为了推导出更准确的目标域预测器,利用来自另一与目标域相近的公开域标注数据通常具有优势。这是从公共源域到私有目标域的有监督域适应的现代问题。我们提出两种用于有监督适应的 $(\epsilon, \delta)$-差分隐私适应算法,这些算法利用了一个最近被证明具有良好理论学习保证的通用优化问题。我们的第一种算法专为线性预测器的回归问题设计,并证明能求解一个凸优化问题。第二种算法是更通用的解决方案,适用于可能非凸但满足Lipschitz和光滑性的损失函数。虽然我们的主要目标是理论分析,但我们还报告了若干实验的结果:首先证明算法的非私有版本优于适应基线方法,其次表明对于较大的目标样本量或$\epsilon$值,私有算法的性能仍接近非私有公式的版本。