VTarbel: Targeted Label Attack with Minimal Knowledge on Detector-enhanced Vertical Federated Learning

Vertical federated learning (VFL) enables multiple parties with disjoint features to collaboratively train models without sharing raw data. While privacy vulnerabilities of VFL are extensively-studied, its security threats-particularly targeted label attacks-remain underexplored. In such attacks, a passive party perturbs inputs at inference to force misclassification into adversary-chosen labels. Existing methods rely on unrealistic assumptions (e.g., accessing VFL-model's outputs) and ignore anomaly detectors deployed in real-world systems. To bridge this gap, we introduce VTarbel, a two-stage, minimal-knowledge attack framework explicitly designed to evade detector-enhanced VFL inference. During the preparation stage, the attacker selects a minimal set of high-expressiveness samples (via maximum mean discrepancy), submits them through VFL protocol to collect predicted labels, and uses these pseudo-labels to train estimated detector and surrogate model on local features. In attack stage, these models guide gradient-based perturbations of remaining samples, crafting adversarial instances that induce targeted misclassifications and evade detection. We implement VTarbel and evaluate it against four model architectures, seven multimodal datasets, and two anomaly detectors. Across all settings, VTarbel outperforms four state-of-the-art baselines, evades detection, and retains effective against three representative privacy-preserving defenses. These results reveal critical security blind spots in current VFL deployments and underscore urgent need for robust, attack-aware defenses.

翻译：纵向联邦学习（VFL）允许多个特征互不相交的参与方在不共享原始数据的情况下协作训练模型。尽管VFL的隐私漏洞已得到广泛研究，但其安全威胁——尤其是定向标签攻击——仍未得到充分探索。在此类攻击中，被动方在推理阶段扰动输入，以迫使模型误分类至攻击者选定的标签。现有方法依赖不现实的假设（例如访问VFL模型的输出），且忽略了实际系统中部署的异常检测器。为弥补这一差距，我们提出VTarbel——一个两阶段、最小知识攻击框架，明确设计用于规避检测器增强的VFL推理。在准备阶段，攻击者通过最大均值差异选择最小规模的高表达能力样本集，通过VFL协议提交以收集预测标签，并利用这些伪标签基于本地特征训练估计检测器与代理模型。在攻击阶段，这些模型指导对剩余样本进行基于梯度的扰动，生成既能引发定向误分类又能规避检测的对抗样本。我们实现了VTarbel，并在四种模型架构、七个多模态数据集和两种异常检测器上进行了评估。在所有设定下，VTarbel均优于四种先进基线方法，成功规避检测，并对三种代表性隐私保护防御保持有效攻击能力。这些结果揭示了当前VFL部署中关键的安全盲区，并凸显了构建具备攻击感知能力的鲁棒防御机制的迫切需求。