In this paper, we consider differentially private classification when some features are sensitive, while the rest of the features and the label are not. We adapt the definition of differential privacy naturally to this setting. Our main contribution is a novel adaptation of AdaBoost that is not only provably differentially private, but also significantly outperforms a natural benchmark that assumes the entire data of the individual is sensitive in the experiments. As a surprising observation, we show that boosting randomly generated classifiers suffices to achieve high accuracy. Our approach easily adapts to the classical setting where all the features are sensitive, providing an alternate algorithm for differentially private linear classification with a much simpler privacy proof and comparable or higher accuracy than differentially private logistic regression on real-world datasets.
翻译:在本文中,我们考虑了当部分特征敏感而其余特征及标签不敏感时的差分隐私分类问题。我们对该场景下的差分隐私定义进行了自然适配。主要贡献在于提出了一种AdaBoost的新颖改编版本,该版本不仅具有可证明的差分隐私性,且在实验中显著优于将个体全部数据视为敏感的天然基准方法。一个令人惊讶的发现是,通过提升随机生成的分类器足以实现高精度。我们的方法能轻松适配所有特征均敏感的传统场景,为差分隐私线性分类提供了一种替代算法——其隐私证明更为简洁,在真实数据集上的精度可比甚至超越差分隐私逻辑回归。