In many applications, the labeled data at the learner's disposal is subject to privacy constraints and is relatively limited. To derive a more accurate predictor for the target domain, it is often beneficial to leverage publicly available labeled data from an alternative domain, somewhat close to the target domain. This is the modern problem of supervised domain adaptation from a public source to a private target domain. We present two $(\epsilon, \delta)$-differentially private adaptation algorithms for supervised adaptation, for which we make use of a general optimization problem, recently shown to benefit from favorable theoretical learning guarantees. Our first algorithm is designed for regression with linear predictors and shown to solve a convex optimization problem. Our second algorithm is a more general solution for loss functions that may be non-convex but Lipschitz and smooth. While our main objective is a theoretical analysis, we also report the results of several experiments first demonstrating that the non-private versions of our algorithms outperform adaptation baselines and next showing that, for larger values of the target sample size or $\epsilon$, the performance of our private algorithms remains close to that of the non-private formulation.
翻译:在许多应用中,学习者可访问的标注数据受到隐私约束且相对有限。为了在目标域中获得更准确的预测器,利用来自与目标域较为接近的备选域的公开标注数据通常是有益的。这是从公共源域到私有目标域的有监督域自适应的现代问题。我们提出了两种用于有监督自适应的(ε, δ)-差分隐私自适应算法,这些算法利用了一个通用优化问题——该问题近期被证明具有有利的理论学习保证。我们的第一个算法专为线性预测器的回归任务设计,且被证明可求解一个凸优化问题。第二个算法是更通用的解决方案,适用于可能非凸但满足Lipschitz连续且光滑的损失函数。虽然我们的主要目标是理论分析,但我们也报告了多项实验的结果:首先证明算法的非私有版本优于自适应基线方法,其次表明当目标样本量或ε取值较大时,私有算法的性能仍能保持与非私有公式相当的水平。