Motivated by problems arising in digital advertising, we introduce the task of training differentially private (DP) machine learning models with semi-sensitive features. In this setting, a subset of the features is known to the attacker (and thus need not be protected) while the remaining features as well as the label are unknown to the attacker and should be protected by the DP guarantee. This task interpolates between training the model with full DP (where the label and all features should be protected) or with label DP (where all the features are considered known, and only the label should be protected). We present a new algorithm for training DP models with semi-sensitive features. Through an empirical evaluation on real ads datasets, we demonstrate that our algorithm surpasses in utility the baselines of (i) DP stochastic gradient descent (DP-SGD) run on all features (known and unknown), and (ii) a label DP algorithm run only on the known features (while discarding the unknown ones).
翻译:受数字广告领域实际问题的启发,我们提出了在半敏感特征条件下训练差分隐私机器学习模型的任务。在此设定中,攻击者已知部分特征(因此无需保护),而剩余特征及标签对攻击者未知,需通过差分隐私保证加以保护。该任务介于完全差分隐私训练(需保护标签与所有特征)与标签差分隐私训练(视所有特征为已知,仅需保护标签)之间。我们提出了一种面向半敏感特征的差分隐私模型训练新算法。通过在真实广告数据集上的实证评估,我们证明该算法在效用上优于以下基线方法:(i)对所有特征(已知与未知)执行差分隐私随机梯度下降(DP-SGD)的基线,以及(ii)仅对已知特征(舍弃未知特征)执行标签差分隐私算法的基线。