The privacy-sensitive nature of decentralized datasets and the robustness of eXtreme Gradient Boosting (XGBoost) on tabular data raise the needs to train XGBoost in the context of federated learning (FL). Existing works on federated XGBoost in the horizontal setting rely on the sharing of gradients, which induce per-node level communication frequency and serious privacy concerns. To alleviate these problems, we develop an innovative framework for horizontal federated XGBoost which does not depend on the sharing of gradients and simultaneously boosts privacy and communication efficiency by making the learning rates of the aggregated tree ensembles learnable. We conduct extensive evaluations on various classification and regression datasets, showing our approach achieves performance comparable to the state-of-the-art method and effectively improves communication efficiency by lowering both communication rounds and communication overhead by factors ranging from 25x to 700x.
翻译:去中心化数据集的隐私敏感特性以及极限梯度提升(XGBoost)在表格数据上的鲁棒性,推动了对联邦学习(FL)环境中训练XGBoost的需求。现有水平联邦XGBoost的相关工作依赖于梯度共享,这会导致每节点级别的通信频率并引发严重的隐私问题。为解决这些问题,我们提出了一种创新的水平联邦XGBoost框架,该框架无需依赖梯度共享,同时通过使聚合树集成的学习率可学习,提升了隐私保护与通信效率。我们在多种分类与回归数据集上进行了广泛评估,结果表明所提方法在性能上可与现有最优方法媲美,并通过将通信轮次和通信开销降低25倍至700倍,有效提升了通信效率。