Federated Learning (FL) has gained considerable traction, yet, for tabular data, FL has received less attention. Most FL research has focused on Neural Networks while Tree-Based Models (TBMs) such as XGBoost have historically performed better on tabular data. It has been shown that subsampling of training data when building trees can improve performance but it is an open problem whether such subsampling can improve performance in FL. In this paper, we evaluate a histogram-based federated XGBoost that uses Minimal Variance Sampling (MVS). We demonstrate the underlying algorithm and show that our model using MVS can improve performance in terms of accuracy and regression error in a federated setting. In our evaluation, our model using MVS performs better than uniform (random) sampling and no sampling at all. It achieves both outstanding local and global performance on a new set of federated tabular datasets. Federated XGBoost using MVS also outperforms centralized XGBoost in half of the studied cases.
翻译:联邦学习(FL)已获得广泛关注,然而在表格数据领域,FL的研究相对较少。现有大多数FL研究聚焦于神经网络,而XGBoost等基于树的模型(TBMs)在表格数据上历来表现更优。研究表明,构建树时对训练数据进行子采样可以提升性能,但该采样方法能否在FL场景中改善性能仍是一个开放性问题。本文评估了一种基于直方图的联邦XGBoost,该方法采用最小方差采样(MVS)。我们阐述了底层算法,并证明采用MVS的模型能在联邦环境下提升准确率和回归误差性能。实验评估中,采用MVS的模型表现优于均匀(随机)采样和无采样方法。在全新联邦表格数据集上,该模型同时实现了优异的局部与全局性能。此外,采用MVS的联邦XGBoost在半数研究案例中超越了集中式XGBoost。