Though vertical federated learning (VFL) is generally considered to be privacy-preserving, recent studies have shown that VFL system is vulnerable to label inference attacks originating from various attack surfaces. Among these attacks, the model completion (MC) attack is currently the most powerful one. Existing defense methods against it either sacrifice model accuracy or incur impractical computational overhead. In this paper, we propose VMask, a novel label privacy protection framework designed to defend against MC attack from the perspective of layer masking. Our key insight is to disrupt the strong correlation between input data and intermediate outputs by applying the secret sharing (SS) technique to mask layer parameters in the attacker's model. We devise a strategy for selecting critical layers to mask, reducing the overhead that would arise from naively applying SS to the entire model. Moreover, VMask is the first framework to offer a tunable privacy budget to defenders, allowing for flexible control over the levels of label privacy according to actual requirements. We built a VFL system, implemented VMask on it, and extensively evaluated it using five model architectures and 13 datasets with different modalities, comparing it to 12 other defense methods. The results demonstrate that VMask achieves the best privacy-utility trade-off, successfully thwarting the MC attack (reducing the label inference accuracy to a random guessing level) while preserving model performance (e.g., in Transformer-based model, the averaged drop of VFL model accuracy is only 0.09%). VMask's runtime is up to 60,846 times faster than cryptography-based methods, and it only marginally exceeds that of standard VFL by 1.8 times in a large Transformer-based model, which is generally acceptable.
翻译:尽管纵向联邦学习(VFL)通常被认为是保护隐私的,但近期研究表明,VFL系统容易受到来自多种攻击面的标签推断攻击。在这些攻击中,模型补全(MC)攻击是目前最强大的一种。现有的防御方法要么牺牲模型精度,要么产生不切实际的计算开销。本文提出VMask,一种新颖的标签隐私保护框架,旨在从层掩码的角度防御MC攻击。我们的核心思路是通过应用秘密共享(SS)技术来掩码攻击者模型中的层参数,从而破坏输入数据与中间输出之间的强相关性。我们设计了一种关键层选择策略进行掩码,避免了将SS简单应用于整个模型所产生的开销。此外,VMask是首个为防御者提供可调隐私预算的框架,允许根据实际需求灵活控制标签隐私的保护级别。我们构建了一个VFL系统,在其上实现了VMask,并使用五种模型架构和13个不同模态的数据集进行了广泛评估,与12种其他防御方法进行了比较。结果表明,VMask实现了最佳的隐私-效用权衡,成功挫败了MC攻击(将标签推断准确率降至随机猜测水平),同时保持了模型性能(例如,在基于Transformer的模型中,VFL模型精度的平均下降仅为0.09%)。VMask的运行速度比基于密码学的方法快高达60,846倍,在大型基于Transformer的模型中仅比标准VFL慢1.8倍,这通常是可以接受的。