Machine learning (ML) models can memorize training datasets. As a result, training ML models over private datasets can violate the privacy of individuals. Differential privacy (DP) is a rigorous privacy notion to preserve the privacy of underlying training datasets in ML models. Yet, training ML models in a DP framework usually degrades the accuracy of ML models. This paper aims to boost the accuracy of a DP-ML model, specifically a logistic regression model, via a pre-training module. In more detail, we initially pre-train our model on a public training dataset that there is no privacy concern about it. Then, we fine-tune our model via the DP logistic regression with the private dataset. In the numerical results, we show that adding a pre-training module significantly improves the accuracy of the DP logistic regression.
翻译:机器学习模型能够记忆训练数据集,因此在私人数据集上训练机器学习模型可能会侵犯个人隐私。差分隐私(DP)是一种严格的隐私保护概念,旨在保护机器学习模型中底层训练数据集的隐私。然而,在差分隐私框架下训练机器学习模型通常会降低模型精度。本文旨在通过预训练模块提升差分隐私机器学习模型(特别是逻辑回归模型)的精度。具体而言,我们首先在无隐私顾虑的公开训练数据集上预训练模型,随后使用私人数据集通过差分隐私逻辑回归对模型进行微调。数值结果表明,引入预训练模块可显著提升差分隐私逻辑回归的精度。