Towards Privacy-Guaranteed Label Unlearning in Vertical Federated Learning: Few-Shot Forgetting without Disclosure

from arxiv, We introduce the first method for label unlearning in vertical federated learning (VFL), focused on preventing label leakage by the active party

This paper addresses the critical challenge of unlearning in Vertical Federated Learning (VFL), a setting that has received far less attention than its horizontal counterpart. Specifically, we propose the first method tailored to \textit{label unlearning} in VFL, where labels play a dual role as both essential inputs and sensitive information. To this end, we employ a representation-level manifold mixup mechanism to generate synthetic embeddings for both unlearned and retained samples. This is to provide richer signals for the subsequent gradient-based label forgetting and recovery steps. These augmented embeddings are then subjected to gradient-based label forgetting, effectively removing the associated label information from the model. To recover performance on the retained data, we introduce a recovery-phase optimization step that refines the remaining embeddings. This design achieves effective label unlearning while maintaining computational efficiency. We validate our method through extensive experiments on diverse datasets, including MNIST, CIFAR-10, CIFAR-100, ModelNet, Brain Tumor MRI, COVID-19 Radiography, and Yahoo Answers demonstrate strong efficacy and scalability. Overall, this work establishes a new direction for unlearning in VFL, showing that re-imagining mixup as an efficient mechanism can unlock practical and utility-preserving unlearning. The code is publicly available at \href{https://github.com/bryanhx/Towards-Privacy-Guaranteed-Label-Unlearning-in-Vertical-Federated-Learning}{https://github.com/bryanhx/Towards-Privacy-Guaranteed-Label-Unlearning-in-Vertical-Federated-Learning}

翻译：本文解决了垂直联邦学习（VFL）中遗忘这一关键挑战，该领域相较于其水平联邦学习对应方案所获关注远为不足。具体而言，我们提出了首个专为VFL中**标签遗忘**设计的方法，其中标签兼具必要输入与敏感信息的双重角色。为此，我们采用表示层流形混合机制，为待遗忘样本与保留样本生成合成嵌入表示，从而为后续基于梯度的标签遗忘与恢复步骤提供更丰富的信号。这些增强后的嵌入表示随后经过基于梯度的标签遗忘处理，有效从模型中移除相关标签信息。为恢复模型在保留数据上的性能，我们引入了恢复阶段的优化步骤以精炼剩余嵌入表示。该设计在保持计算效率的同时实现了有效的标签遗忘。我们在多种数据集上通过广泛实验验证了所提方法的有效性，包括MNIST、CIFAR-10、CIFAR-100、ModelNet、脑肿瘤MRI、COVID-19放射影像及Yahoo Answers数据集，结果均显示出优异的效能与可扩展性。总体而言，本研究为VFL中的遗忘问题开辟了新方向，表明将混合技术重新构想为高效机制能够实现实用且保持性能的遗忘方案。相关代码已公开于 \href{https://github.com/bryanhx/Towards-Privacy-Guaranteed-Label-Unlearning-in-Vertical-Federated-Learning}{https://github.com/bryanhx/Towards-Privacy-Guaranteed-Label-Unlearning-in-Vertical-Federated-Learning}。