Decision trees are interpretable models that are well-suited to non-linear learning problems. Much work has been done on extending decision tree learning algorithms with differential privacy, a system that guarantees the privacy of samples within the training data. However, current state-of-the-art algorithms for this purpose sacrifice much utility for a small privacy benefit. These solutions create random decision nodes that reduce decision tree accuracy or spend an excessive share of the privacy budget on labeling leaves. Moreover, many works do not support or leak information about feature values when data is continuous. We propose a new method called PrivaTree based on private histograms that chooses good splits while consuming a small privacy budget. The resulting trees provide a significantly better privacy-utility trade-off and accept mixed numerical and categorical data without leaking additional information. Finally, while it is notoriously hard to give robustness guarantees against data poisoning attacks, we prove bounds for the expected success rates of backdoor attacks against differentially-private learners. Our experimental results show that PrivaTree consistently outperforms previous works on predictive accuracy and significantly improves robustness against backdoor attacks compared to regular decision trees.
翻译:决策树是可解释模型,适用于非线性学习问题。已有大量研究致力于将差分隐私(一种保证训练数据中样本隐私的机制)扩展到决策树学习算法中。然而,当前用于此目的的最先进算法在较小的隐私收益下牺牲了大量效用。这些方案创建随机决策节点,降低了决策树准确率,或将过多隐私预算用于标记叶节点。此外,许多工作在处理连续数据时不支持或泄露特征值信息。我们提出一种基于私有直方图的新方法PrivaTree,该方法在消耗少量隐私预算的同时选择优质分裂。由此生成的树显著提升了隐私-效用权衡,能够处理混合数值与分类数据而不泄露额外信息。最后,尽管对数据投毒攻击的鲁棒性保障极难实现,我们证明了针对差分隐私学习器的后门攻击预期成功率的界限。实验结果表明,与常规决策树相比,PrivaTree在预测准确率上持续优于先前工作,并显著增强了针对后门攻击的鲁棒性。