Differentially-private (DP) mechanisms can be embedded into the design of a machine learning algorithm to protect the resulting model against privacy leakage. However, this often comes with a significant loss of accuracy due to the noise added to enforce DP. In this paper, we aim at improving this trade-off for a popular class of machine learning algorithms leveraging the Gini impurity as an information gain criterion to greedily build interpretable models such as decision trees or rule lists. To this end, we establish the smooth sensitivity of the Gini impurity, which can be used to obtain thorough DP guarantees while adding noise scaled with tighter magnitude. We illustrate the applicability of this mechanism by integrating it within a greedy algorithm producing rule list models, motivated by the fact that such models remain understudied in the DP literature. Our theoretical analysis and experimental results confirm that the DP rule lists models integrating smooth sensitivity have higher accuracy that those using other DP frameworks based on global sensitivity, for identical privacy budgets.
翻译:差分隐私(DP)机制可嵌入机器学习算法的设计中,以保护所得模型免受隐私泄露。然而,由于为强制实施DP而添加的噪声,这通常会导致准确性的显著损失。在本文中,我们旨在改善一类流行机器学习算法的这种权衡,该类算法利用基尼不纯度作为信息增益准则,贪婪地构建可解释模型(如决策树或规则列表)。为此,我们建立了基尼不纯度的平滑敏感性,该敏感性可用于获得严格的DP保证,同时添加幅度更小的缩放噪声。我们通过将其集成到生成规则列表模型的贪婪算法中来阐明该机制的适用性,其动机在于此类模型在DP文献中仍未得到充分研究。我们的理论分析和实验结果表明,在相同隐私预算下,集成平滑敏感性的DP规则列表模型比使用基于全局敏感性的其他DP框架的模型具有更高的准确性。