Local differential privacy (LDP) has become a central topic in data privacy research, offering strong privacy guarantees by perturbing user data at the source and removing the need for a trusted curator. However, the noise introduced by LDP often significantly reduces data utility. To address this issue, we reinterpret private learning under LDP as a transfer learning problem, where the noisy data serve as the source domain and the unobserved clean data as the target. We propose novel techniques specifically designed for LDP to improve classification performance without compromising privacy: (1) a noised binary feedback-based evaluation mechanism for estimating dataset utility; (2) model reversal, which salvages underperforming classifiers by inverting their decision boundaries; and (3) model averaging, which assigns weights to multiple reversed classifiers based on their estimated utility. We provide theoretical excess risk bounds under LDP and demonstrate how our methods reduce this risk. Empirical results on both simulated and real-world datasets show substantial improvements in classification accuracy.
翻译:本地差分隐私(LDP)已成为数据隐私研究的核心课题,通过在源头扰动用户数据并消除对可信中心的需求,提供了强大的隐私保障。然而,LDP引入的噪声通常会显著降低数据效用。为解决这一问题,我们将LDP下的隐私学习重新解释为一个迁移学习问题,其中噪声数据作为源域,未观测的干净数据作为目标域。我们提出了专门为LDP设计的新技术,以在不损害隐私的前提下提升分类性能:(1)一种基于噪声二元反馈的数据集效用评估机制;(2)模型反转,通过反转分类器的决策边界来挽救性能不佳的分类器;(3)模型平均,根据估计的效用为多个反转后的分类器分配权重。我们给出了LDP下的理论超额风险界,并证明了我们的方法如何降低该风险。在模拟和真实数据集上的实证结果表明,分类准确率得到了显著提升。