Federated Learning (FL) has lately gained traction as it addresses how machine learning models train on distributed datasets. FL was designed for parametric models, namely Deep Neural Networks (DNNs).Thus, it has shown promise on image and text tasks. However, FL for tabular data has received little attention. Tree-Based Models (TBMs) have been considered to perform better on tabular data and they are starting to see FL integrations. In this study, we benchmark federated TBMs and DNNs for horizontal FL, with varying data partitions, on 10 well-known tabular datasets. Our novel benchmark results indicates that current federated boosted TBMs perform better than federated DNNs in different data partitions. Furthermore, a federated XGBoost outperforms all other models. Lastly, we find that federated TBMs perform better than federated parametric models, even when increasing the number of clients significantly.
翻译:联邦学习(FL)近期备受关注,它解决了机器学习模型如何在分布式数据集上进行训练的问题。FL最初针对参数化模型(即深度神经网络DNN)设计,因此已在图像和文本任务上展现出潜力。然而,表格数据的联邦学习研究尚属鲜见。基于树的模型(TBM)在表格数据上通常表现更优,且其联邦集成方案已逐步出现。本研究针对水平联邦学习场景,在10个经典表格数据集上,对不同数据划分方式下的联邦TBM与DNN进行了基准测试。我们提出的新型基准结果表明:在不同数据划分情况下,当前联邦增强型TBM的性能普遍优于联邦DNN。进一步地,联邦XGBoost模型在所有模型中表现最佳。最后,我们发现即便客户数量大幅增加,联邦TBM仍优于联邦参数化模型。