In light of increasing privacy concerns and stringent legal regulations, using secure multiparty computation (MPC) to enable collaborative GBDT model training among multiple data owners has garnered significant attention. Despite this, existing MPC-based GBDT frameworks face efficiency challenges due to high communication costs and the computation burden of non-linear operations, such as division and sigmoid calculations. In this work, we introduce Guard-GBDT, an innovative framework tailored for efficient and privacy-preserving GBDT training on vertical datasets. Guard-GBDT bypasses MPC-unfriendly division and sigmoid functions by using more streamlined approximations and reduces communication overhead by compressing the messages exchanged during gradient aggregation. We implement a prototype of Guard-GBDT and extensively evaluate its performance and accuracy on various real-world datasets. The results show that Guard-GBDT outperforms state-of-the-art HEP-XGB (CIKM'21) and SiGBDT (ASIA CCS'24) by up to $2.71\times$ and $12.21 \times$ on LAN network and up to $2.7\times$ and $8.2\times$ on WAN network. Guard-GBDT also achieves comparable accuracy with SiGBDT and plaintext XGBoost (better than HEP-XGB ), which exhibits a deviation of $\pm1\%$ to $\pm2\%$ only. Our implementation code is provided at https://github.com/XidianNSS/Guard-GBDT.git.
翻译:鉴于日益增长的隐私关切与严格的法律法规,利用安全多方计算(MPC)实现多方数据所有者间的协同GBDT模型训练已获得广泛关注。然而,现有基于MPC的GBDT框架因高昂的通信开销以及非线性操作(如除法与Sigmoid计算)带来的计算负担,面临效率挑战。本研究提出Guard-GBDT,一种专为垂直数据集上高效隐私保护GBDT训练设计的创新框架。Guard-GBDT通过采用更简化的近似方法规避了MPC不友好的除法与Sigmoid函数,并通过压缩梯度聚合过程中交换的信息降低了通信开销。我们实现了Guard-GBDT的原型系统,并在多个真实数据集上对其性能与精度进行了全面评估。实验结果表明,在局域网环境中,Guard-GBDT相较当前最先进的HEP-XGB(CIKM'21)与SiGBDT(ASIA CCS'24)分别实现最高$2.71\times$与$12.21\times$的性能提升;在广域网环境中,分别实现最高$2.7\times$与$8.2\times$的性能提升。同时,Guard-GBDT取得了与SiGBDT及明文XGBoost(优于HEP-XGB)相当的精度,其偏差仅在$\pm1\%$至$\pm2\%$范围内。实现代码已发布于https://github.com/XidianNSS/Guard-GBDT.git。