In a vertical federated learning (VFL) system consisting of a central server and many distributed clients, the training data are vertically partitioned such that different features are privately stored on different clients. The problem of split VFL is to train a model split between the server and the clients. This paper aims to address two major challenges in split VFL: 1) performance degradation due to straggling clients during training; and 2) data and model privacy leakage from clients' uploaded data embeddings. We propose FedVS to simultaneously address these two challenges. The key idea of FedVS is to design secret sharing schemes for the local data and models, such that information-theoretical privacy against colluding clients and curious server is guaranteed, and the aggregation of all clients' embeddings is reconstructed losslessly, via decrypting computation shares from the non-straggling clients. Extensive experiments on various types of VFL datasets (including tabular, CV, and multi-view) demonstrate the universal advantages of FedVS in straggler mitigation and privacy protection over baseline protocols.
翻译:摘要:在由中央服务器与众多分布式客户端构成的纵向联邦学习(VFL)系统中,训练数据按纵向划分,不同特征分别存储于不同客户端。拆分式VFL的核心问题,是训练一个在服务器与客户端之间进行拆分的模型。本文旨在解决拆分式VFL中的两大挑战:1) 训练过程中因掉队客户端导致的性能退化;2) 客户端上传数据嵌入所引发的数据与模型隐私泄露。我们提出FedVS方法以同步应对这两大挑战。FedVS的核心思想是:为本地数据与模型设计秘密共享方案,从而在信息论层面确保抵御合谋客户端与好奇服务器的隐私保护能力;同时,通过解密非掉队客户端的计算份额,无损恢复所有客户端嵌入的聚合结果。在涵盖表格数据、计算机视觉与多视图等多类VFL数据集上的大量实验表明,FedVS在减轻掉队者影响与隐私保护方面,相较于基线协议具有普适性优势。