Certifying the Right to Be Forgotten: Primal-Dual Optimization for Sample and Label Unlearning in Vertical Federated Learning

Federated unlearning has become an attractive approach to address privacy concerns in collaborative machine learning, for situations when sensitive data is remembered by AI models during the machine learning process. It enables the removal of specific data influences from trained models, aligning with the growing emphasis on the "right to be forgotten." While extensively studied in horizontal federated learning, unlearning in vertical federated learning (VFL) remains challenging due to the distributed feature architecture. VFL unlearning includes sample unlearning that removes specific data points' influence and label unlearning that removes entire classes. Since different parties hold complementary features of the same samples, unlearning tasks require cross-party coordination, creating computational overhead and complexities from feature interdependencies. To address such challenges, we propose FedORA (Federated Optimization for data Removal via primal-dual Algorithm), designed for sample and label unlearning in VFL. FedORA formulates the removal of certain samples or labels as a constrained optimization problem solved using a primal-dual framework. Our approach introduces a new unlearning loss function that promotes classification uncertainty rather than misclassification. An adaptive step size enhances stability, while an asymmetric batch design, considering the prior influence of the remaining data on the model, handles unlearning and retained data differently to efficiently reduce computational costs. We provide theoretical analysis proving that the model difference between FedORA and Train-from-scratch is bounded, establishing guarantees for unlearning effectiveness. Experiments on tabular and image datasets demonstrate that FedORA achieves unlearning effectiveness and utility preservation comparable to Train-from-scratch with reduced computation and communication overhead.

翻译：联邦遗忘已成为解决协作机器学习中隐私问题的一种有吸引力的方法，适用于AI模型在机器学习过程中记住敏感数据的情况。它能够从已训练模型中移除特定数据的影响，符合日益受到重视的“被遗忘权”。尽管在横向联邦学习中已得到广泛研究，但由于分布式特征架构，垂直联邦学习中的遗忘问题仍然具有挑战性。VFL遗忘包括移除特定数据点影响的样本遗忘和移除整个类别的标签遗忘。由于不同参与方持有相同样本的互补特征，遗忘任务需要跨参与方协调，从而产生计算开销以及由特征相互依赖性带来的复杂性。为应对这些挑战，我们提出了FedORA（基于原始-对偶算法的数据移除联邦优化方法），专为VFL中的样本与标签遗忘设计。FedORA将移除特定样本或标签表述为一个使用原始-对偶框架求解的约束优化问题。我们的方法引入了一种新的遗忘损失函数，该函数促进分类不确定性而非错误分类。自适应步长增强了稳定性，而考虑到剩余数据对模型的先验影响，非对称批次设计对遗忘数据和保留数据进行差异化处理，以有效降低计算成本。我们提供了理论分析，证明FedORA与从头训练模型之间的差异是有界的，从而为遗忘有效性建立了保证。在表格数据和图像数据集上的实验表明，FedORA在降低计算和通信开销的同时，实现了与从头训练相当的遗忘有效性和效用保持性。