We study the task of training regression models with the guarantee of label differential privacy (DP). Based on a global prior distribution on label values, which could be obtained privately, we derive a label DP randomization mechanism that is optimal under a given regression loss function. We prove that the optimal mechanism takes the form of a "randomized response on bins", and propose an efficient algorithm for finding the optimal bin values. We carry out a thorough experimental evaluation on several datasets demonstrating the efficacy of our algorithm.
翻译:我们研究了在标签差分隐私(DP)保证下训练回归模型的任务。基于可私密获取的标签值全局先验分布,我们推导出一种在给定回归损失函数下最优的标签DP随机化机制。我们证明最优机制采用"分箱随机响应"形式,并提出一种高效算法来寻找最优分箱值。我们在多个数据集上进行了全面的实验评估,验证了所提算法的有效性。