Gradient boosting decision forests, used by XGBoost or AdaBoost, offer higher accuracy and lower training times than decision trees for large datasets. Protocols for private inference over decision trees can be used to preserve the privacy of the input data as well as the privacy of the trees. However, naively extending private inference over decision trees to private inference over decision forests by replicating the protocols leads to impractical running times. In this paper, we propose an efficient private decision inference protocol using homomorphic encryption. We present several optimizations that identify and then remove (approximate) duplication between the trees in a forest, thereby achieving significant improvements in communication and computation cost over the naive approach. To the best of our knowledge, we present the first private inference protocol for highly scalable gradient boosting decision forests. Our protocol's (SilentWood) inference time is faster than the baseline of parallel running the RCC-PDTE protocol by Mahdavi et al. by up to 42.5x, and faster than Zama's Concrete ML XGBoost by up to 27.8x, and faster than SoK-GGG's two-party garbled circuit protocol by 2.94x.
翻译:由XGBoost或AdaBoost等使用的梯度提升决策森林,在处理大规模数据集时相比单一决策树具有更高的准确性和更短的训练时间。针对决策树的隐私推理协议可用于保护输入数据及树模型本身的隐私。然而,若通过简单复制协议的方式将决策树隐私推理直接扩展至决策森林,将导致实际运行时间不可接受。本文提出一种基于同态加密的高效隐私决策推理协议。我们提出了多项优化技术,通过识别并消除森林中树之间的(近似)重复结构,从而在通信与计算成本上相比朴素方法实现显著提升。据我们所知,我们首次提出了适用于高可扩展性梯度提升决策森林的隐私推理协议。我们的协议(SilentWood)推理速度相比Mahdavi等人提出的并行运行RCC-PDTE协议基线提升最高达42.5倍,较Zama的Concrete ML XGBoost提升最高达27.8倍,较SoK-GGG的双方混淆电路协议提升2.94倍。