Secure Vertical Federated Learning Under Unreliable Connectivity

Most work in privacy-preserving federated learning (FL) has focused on horizontally partitioned datasets where clients hold the same features and train complete client-level models independently. However, individual data points are often scattered across different institutions, known as clients, in vertical FL (VFL) settings. Addressing this category of FL necessitates the exchange of intermediate outputs and gradients among participants, resulting in potential privacy leakage risks and slow convergence rates. Additionally, in many real-world scenarios, VFL training also faces the acute issue of client stragglers and drop-outs, a serious challenge that can significantly hinder the training process but has been largely overlooked in existing studies. In this work, we present vFedSec, a first dropout-tolerant VFL protocol, which can support the most generalized vertical framework. It achieves secure and efficient model training by using an innovative Secure Layer alongside an embedding-padding technique. We provide theoretical proof that our design attains enhanced security while maintaining training performance. Empirical results from extensive experiments also demonstrate vFedSec is robust to client dropout and provides secure training with negligible computation and communication overhead. Compared to widely adopted homomorphic encryption (HE) methods, our approach achieves a remarkable > 690x speedup and reduces communication costs significantly by > 9.6x.

翻译：大多数隐私保护联邦学习（FL）的研究工作聚焦于水平分割数据集，其中客户端持有相同特征并独立训练完整的客户端级模型。然而，在垂直联邦学习（VFL）场景中，单个数据点通常分散在不同机构（即客户端）之间。处理此类联邦学习需要参与者之间交换中间输出和梯度，这可能导致潜在的隐私泄露风险和收敛速度缓慢。此外，在许多实际场景中，VFL训练还面临客户端掉队和退出的严峻问题——这一挑战会严重阻碍训练进程，但在现有研究中常被忽视。在本工作中，我们提出了vFedSec，这是首个支持最通用垂直框架的容忍退出的VFL协议。通过创新的安全层（Secure Layer）与嵌入-填充技术，该协议实现了安全高效的模型训练。我们提供了理论证明，表明我们的设计在保持训练性能的同时增强了安全性。大量实验的经验结果也表明，vFedSec对客户端退出具有鲁棒性，并以可忽略的计算和通信开销提供安全训练。与广泛采用的同态加密（HE）方法相比，我们的方法实现了超过690倍的显著加速，并将通信成本大幅降低超过9.6倍。

相关内容

联邦学习

关注 200

联邦学习（Federated Learning）是一种新兴的人工智能基础技术，在 2016 年由谷歌最先提出，原本用于解决安卓手机终端用户在本地更新模型的问题，其设计目标是在保障大数据交换时的信息安全、保护终端数据和个人数据隐私、保证合法合规的前提下，在多参与方或多计算结点之间开展高效率的机器学习。其中，联邦学习可使用的机器学习算法不局限于神经网络，还包括随机森林等重要算法。联邦学习有望成为下一代人工智能协同算法和协作网络的基础。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日