Vertical federated learning (VFL), a variant of Federated Learning (FL), has recently drawn increasing attention as the VFL matches the enterprises' demands of leveraging more valuable features to achieve better model performance. However, conventional VFL methods may run into data deficiency as they exploit only aligned and labeled samples (belonging to different parties), leaving often the majority of unaligned and unlabeled samples unused. The data deficiency hampers the effort of the federation. In this work, we propose a Federated Hybrid Self-Supervised Learning framework, named FedHSSL, that utilizes cross-party views (i.e., dispersed features) of samples aligned among parties and local views (i.e., augmentation) of unaligned samples within each party to improve the representation learning capability of the VFL joint model. FedHSSL further exploits invariant features across parties to boost the performance of the joint model through partial model aggregation. FedHSSL, as a framework, can work with various representative SSL methods. We empirically demonstrate that FedHSSL methods outperform baselines by large margins. We provide an in-depth analysis of FedHSSL regarding label leakage, which is rarely investigated in existing self-supervised VFL works. The experimental results show that, with proper protection, FedHSSL achieves the best privacy-utility trade-off against the state-of-the-art label inference attack compared with baselines. Code is available at \url{https://github.com/jorghyq2016/FedHSSL}.
翻译:垂直联邦学习(VFL)作为联邦学习(FL)的一种变体,近年来因其契合企业利用更丰富特征提升模型性能的需求而备受关注。然而,传统VFL方法仅利用属于不同参与方的对齐且带标签样本,导致大多数未对齐和无标签样本被弃用,从而陷入数据匮乏困境。数据不足严重制约了联邦学习的效能。本文提出联邦混合自监督学习框架FedHSSL,该框架通过跨参与方的对齐样本视角(即离散特征)与各参与方内部未对齐样本的局部视图(即数据增强),共同提升VFL联合模型的表征学习能力。FedHSSL进一步利用跨参与方的不变特征,通过部分模型聚合强化联合模型性能。作为通用框架,FedHSSL可与多种代表性自监督学习方法协同工作。实验证明,FedHSSL方法在性能上显著超越基线方法。针对现有自监督VFL研究鲜有涉及的标签泄露问题,本文对FedHSSL进行了深度分析。结果表明,在合理保护措施下,FedHSSL相较于基线方法,在与最先进标签推理攻击的对抗中实现了最优的隐私-效用平衡。代码已开源至\url{https://github.com/jorghyq2016/FedHSSL}。