In recent years, data are typically distributed in multiple organizations while the data security is becoming increasingly important. Federated Learning (FL), which enables multiple parties to collaboratively train a model without exchanging the raw data, has attracted more and more attention. Based on the distribution of data, FL can be realized in three scenarios, i.e., horizontal, vertical, and hybrid. In this paper, we propose to combine distributed machine learning techniques with Vertical FL and propose a Distributed Vertical Federated Learning (DVFL) approach. The DVFL approach exploits a fully distributed architecture within each party in order to accelerate the training process. In addition, we exploit Homomorphic Encryption (HE) to protect the data against honest-but-curious participants. We conduct extensive experimentation in a large-scale cluster environment and a cloud environment in order to show the efficiency and scalability of our proposed approach. The experiments demonstrate the good scalability of our approach and the significant efficiency advantage (up to 6.8 times with a single server and 15.1 times with multiple servers in terms of the training time) compared with baseline frameworks.
翻译:近年来,数据通常分布在多个组织中,而数据安全问题日益重要。联邦学习(Federated Learning, FL)允许多方在不交换原始数据的情况下协作训练模型,因此受到越来越多的关注。根据数据的分布方式,联邦学习可在三种场景下实现,即水平、垂直和混合场景。本文提出将分布式机器学习技术与垂直联邦学习相结合,并设计了一种分布式垂直联邦学习(Distributed Vertical Federated Learning, DVFL)方法。该方法在每一方内部采用完全分布式架构,以加速训练过程。此外,我们利用同态加密(Homomorphic Encryption, HE)来保护数据免受诚实但好奇的参与者的攻击。我们在大规模集群环境和云环境中进行了广泛的实验,以展示所提出方法的效率和可扩展性。实验结果表明,与基线框架相比,该方法具有良好的可扩展性,并且在训练时间方面具有显著的效率优势(单服务器情况下最高可达6.8倍,多服务器情况下最高可达15.1倍)。