The privacy-sensitive nature of decentralized datasets and the robustness of eXtreme Gradient Boosting (XGBoost) on tabular data raise the needs to train XGBoost in the context of federated learning (FL). Existing works on federated XGBoost in the horizontal setting rely on the sharing of gradients, which induce per-node level communication frequency and serious privacy concerns. To alleviate these problems, we develop an innovative framework for horizontal federated XGBoost which does not depend on the sharing of gradients and simultaneously boosts privacy and communication efficiency by making the learning rates of the aggregated tree ensembles learnable. We conduct extensive evaluations on various classification and regression datasets, showing our approach achieves performance comparable to the state-of-the-art method and effectively improves communication efficiency by lowering both communication rounds and communication overhead by factors ranging from 25x to 700x.
翻译:去中心化数据集的隐私敏感性以及极限梯度提升(XGBoost)在表格数据上的鲁棒性,推动了在联邦学习(FL)框架下训练XGBoost的需求。现有关于横向联邦XGBoost的工作依赖梯度共享,这导致了逐节点级别的通信频率和严重的隐私问题。为解决这些问题,我们开发了一种创新的横向联邦XGBoost框架,该框架不依赖梯度共享,同时通过使聚合树集成的学习率可学习来提升隐私性和通信效率。我们在多种分类与回归数据集上进行了广泛评估,结果表明,我们的方法在性能上与现有最优方法相当,并通过将通信轮次和通信开销降低25倍至700倍,有效提升了通信效率。