Decision trees are interpretable models that are well-suited to non-linear learning problems. Much work has been done on extending decision tree learning algorithms with differential privacy, a system that guarantees the privacy of samples within the training data. However, current state-of-the-art algorithms for this purpose sacrifice much utility for a small privacy benefit. These solutions create random decision nodes that reduce decision tree accuracy or spend an excessive share of the privacy budget on labeling leaves. Moreover, many works do not support continuous features or leak information about them. We propose a new method called PrivaTree based on private histograms that chooses good splits while consuming a small privacy budget. The resulting trees provide a significantly better privacy-utility trade-off and accept mixed numerical and categorical data without leaking information about numerical features. Finally, while it is notoriously hard to give robustness guarantees against data poisoning attacks, we demonstrate bounds for the expected accuracy and success rates of backdoor attacks against differentially-private learners. By leveraging the better privacy-utility trade-off of PrivaTree we are able to train decision trees with significantly better robustness against backdoor attacks compared to regular decision trees and with meaningful theoretical guarantees.
翻译:决策树是可解释模型,非常适用于非线性学习问题。大量研究致力于将差分隐私(一种保证训练数据中样本隐私的系统)扩展到决策树学习算法中。然而,当前针对这一目的的最先进算法为了微小的隐私收益而牺牲了大量效用。这些解决方案生成的随机决策节点降低了决策树的准确性,或将过多隐私预算用于标记叶子节点。此外,许多工作不支持连续特征或会泄露其信息。我们提出了一种基于私有直方图的新方法PrivaTree,该方法能在消耗少量隐私预算的同时选择优质分裂点。由此生成的树提供了显著更优的隐私-效用权衡,并能接受混合数值与类别数据而不泄露数值特征信息。最后,尽管为数据投毒攻击提供鲁棒性保证极具挑战性,但我们展示了针对差分隐私学习器的后门攻击预期准确率和成功率的界限。通过利用PrivaTree更好的隐私-效用权衡,我们能够训练出相较于常规决策树具有显著更强后门攻击鲁棒性,且具备有意义的理论保证的决策树。