Towards Privacy-Guaranteed Label Unlearning in Vertical Federated Learning: Few-Shot Forgetting without Disclosure

from arxiv, We introduce the first method for label unlearning in vertical federated learning (VFL), focused on preventing label leakage by the active party

This paper addresses the critical challenge of unlearning in Vertical Federated Learning (VFL), a setting that has received far less attention than its horizontal counterpart. Specifically, we propose the first method tailored to \textit{label unlearning} in VFL, where labels play a dual role as both essential inputs and sensitive information. To this end, we employ a representation-level manifold mixup mechanism to generate synthetic embeddings for both unlearned and retained samples. This is to provide richer signals for the subsequent gradient-based label forgetting and recovery steps. These augmented embeddings are then subjected to gradient-based label forgetting, effectively removing the associated label information from the model. To recover performance on the retained data, we introduce a recovery-phase optimization step that refines the remaining embeddings. This design achieves effective label unlearning while maintaining computational efficiency. We validate our method through extensive experiments on diverse datasets, including MNIST, CIFAR-10, CIFAR-100, ModelNet, Brain Tumor MRI, COVID-19 Radiography, and Yahoo Answers demonstrate strong efficacy and scalability. Overall, this work establishes a new direction for unlearning in VFL, showing that re-imagining mixup as an efficient mechanism can unlock practical and utility-preserving unlearning. The code is publicly available at \href{https://github.com/bryanhx/Towards-Privacy-Guaranteed-Label-Unlearning-in-Vertical-Federated-Learning}{https://github.com/bryanhx/Towards-Privacy-Guaranteed-Label-Unlearning-in-Vertical-Federated-Learning}

翻译：本文解决了纵向联邦学习（VFL）中遗忘这一关键挑战，该领域相较于横向联邦学习受到的关注要少得多。具体而言，我们提出了首个专为VFL中\textit{标签遗忘}设计的方法，其中标签扮演着双重角色：既是必要的输入，也是敏感信息。为此，我们采用表示层面的流形混合机制，为待遗忘样本和保留样本生成合成嵌入表示。这旨在为后续基于梯度的标签遗忘与恢复步骤提供更丰富的信号。这些增强后的嵌入表示随后经过基于梯度的标签遗忘处理，有效地从模型中移除相关的标签信息。为了恢复模型在保留数据上的性能，我们引入了一个恢复阶段的优化步骤来精炼剩余的嵌入表示。该设计在保持计算效率的同时，实现了有效的标签遗忘。我们在多个数据集上进行了广泛的实验来验证我们的方法，包括MNIST、CIFAR-10、CIFAR-100、ModelNet、脑肿瘤MRI、COVID-19放射影像和Yahoo Answers，结果证明了其强大的有效性和可扩展性。总体而言，这项工作为VFL中的遗忘研究确立了一个新方向，表明将混合技术重新构想为一种高效机制，能够实现实用且保持效用的遗忘。代码公开于 \href{https://github.com/bryanhx/Towards-Privacy-Guaranteed-Label-Unlearning-in-Vertical-Federated-Learning}{https://github.com/bryanhx/Towards-Privacy-Guaranteed-Label-Unlearning-in-Vertical-Federated-Learning}。