In recent years, data are typically distributed in multiple organizations while the data security is becoming increasingly important. Federated Learning (FL), which enables multiple parties to collaboratively train a model without exchanging the raw data, has attracted more and more attention. Based on the distribution of data, FL can be realized in three scenarios, i.e., horizontal, vertical, and hybrid. In this paper, we propose to combine distributed machine learning techniques with Vertical FL and propose a Distributed Vertical Federated Learning (DVFL) approach. The DVFL approach exploits a fully distributed architecture within each party in order to accelerate the training process. In addition, we exploit Homomorphic Encryption (HE) to protect the data against honest-but-curious participants. We conduct extensive experimentation in a large-scale cluster environment and a cloud environment in order to show the efficiency and scalability of our proposed approach. The experiments demonstrate the good scalability of our approach and the significant efficiency advantage (up to 6.8 times with a single server and 15.1 times with multiple servers in terms of the training time) compared with baseline frameworks.
翻译:近年来,数据通常分布在多个组织中,而数据安全正变得日益重要。联邦学习(Federated Learning, FL)允许多方在不交换原始数据的情况下协同训练模型,因此受到越来越多的关注。根据数据分布方式,联邦学习可分为水平、垂直和混合三种场景。本文提出将分布式机器学习技术与垂直联邦学习相结合,提出了一种分布式垂直联邦学习(Distributed Vertical Federated Learning, DVFL)方法。该方法在每一方内部采用完全分布式架构以加速训练过程。此外,我们利用同态加密(Homomorphic Encryption, HE)来保护数据免受诚实但好奇的参与者的攻击。我们在大规模集群环境和云环境中进行了大量实验,以展示所提方法的效率和可扩展性。实验结果表明,与基线框架相比,我们的方法具有良好的可扩展性,并在训练时间上具有显著的效率优势(单服务器情况下最高提升6.8倍,多服务器情况下最高提升15.1倍)。