Federated Learning (FL) has been an emerging trend in machine learning and artificial intelligence. It allows multiple participants to collaboratively train a better global model and offers a privacy-aware paradigm for model training since it does not require participants to release their original training data. However, existing FL solutions for vertically partitioned data or decision trees require heavy cryptographic operations. In this paper, we propose a framework named FederBoost for private federated learning of gradient boosting decision trees (GBDT). It supports running GBDT over both vertically and horizontally partitioned data. Vertical FederBoost does not require any cryptographic operation and horizontal FederBoost only requires lightweight secure aggregation. The key observation is that the whole training process of GBDT relies on the ordering of the data instead of the values. We fully implement FederBoost and evaluate its utility and efficiency through extensive experiments performed on three public datasets. Our experimental results show that both vertical and horizontal FederBoost achieve the same level of accuracy with centralized training where all data are collected in a central server, and they are 4-5 orders of magnitude faster than the state-of-the-art solutions for federated decision tree training; hence offering practical solutions for industrial applications.
翻译:联邦学习已成为机器学习和人工智能领域的新兴趋势。它允许多个参与方协同训练更优的全局模型,且无需参与者公开原始训练数据,从而提供了一种隐私感知的模型训练范式。然而,现有面向纵向分区数据或决策树的联邦学习方案需要大量密码学运算。本文提出名为联邦增强的框架,用于实现梯度提升决策树的隐私保护联邦学习。该框架支持在纵向与横向分区数据上运行梯度提升决策树。纵向联邦增强无需任何密码学操作,横向联邦增强仅需轻量级安全聚合。关键发现在于:梯度提升决策树的整个训练过程依赖于数据排序而非具体数值。我们完整实现了联邦增强框架,通过在三个公开数据集上的大量实验评估了其实用性与效率。实验结果表明:纵向与横向联邦增强均能达到与集中式训练(所有数据汇集于中央服务器)相同的准确度,且比当前最先进的联邦决策树训练方案快4-5个数量级,从而为工业应用提供了实用解决方案。