Certifying the Right to Be Forgotten: Primal-Dual Optimization for Sample and Label Unlearning in Vertical Federated Learning

Federated unlearning has become an attractive approach to address privacy concerns in collaborative machine learning, for situations when sensitive data is remembered by AI models during the machine learning process. It enables the removal of specific data influences from trained models, aligning with the growing emphasis on the "right to be forgotten." While extensively studied in horizontal federated learning, unlearning in vertical federated learning (VFL) remains challenging due to the distributed feature architecture. VFL unlearning includes sample unlearning that removes specific data points' influence and label unlearning that removes entire classes. Since different parties hold complementary features of the same samples, unlearning tasks require cross-party coordination, creating computational overhead and complexities from feature interdependencies. To address such challenges, we propose FedORA (Federated Optimization for data Removal via primal-dual Algorithm), designed for sample and label unlearning in VFL. FedORA formulates the removal of certain samples or labels as a constrained optimization problem solved using a primal-dual framework. Our approach introduces a new unlearning loss function that promotes classification uncertainty rather than misclassification. An adaptive step size enhances stability, while an asymmetric batch design, considering the prior influence of the remaining data on the model, handles unlearning and retained data differently to efficiently reduce computational costs. We provide theoretical analysis proving that the model difference between FedORA and Train-from-scratch is bounded, establishing guarantees for unlearning effectiveness. Experiments on tabular and image datasets demonstrate that FedORA achieves unlearning effectiveness and utility preservation comparable to Retrain with reduced computation and communication overhead.

翻译：联邦遗忘已成为解决协作机器学习中隐私问题的一种有前景的方法，适用于人工智能模型在机器学习过程中记忆敏感数据的情形。该方法能够从已训练模型中移除特定数据的影响，符合日益受到重视的"被遗忘权"理念。尽管在横向联邦学习中已得到广泛研究，但由于分布式特征架构的特性，垂直联邦学习中的遗忘机制仍面临挑战。VFL遗忘包含消除特定数据点影响的样本遗忘与移除完整类别的标签遗忘。由于不同参与方持有相同样本的互补特征，遗忘任务需要跨参与方协调，这既产生了计算开销，也因特征间相互依赖关系带来了复杂性。为应对这些挑战，我们提出FedORA（基于原始-对偶算法的联邦数据移除优化方法），专为VFL中的样本与标签遗忘设计。FedORA将特定样本或标签的移除问题构建为约束优化问题，并采用原始-对偶框架求解。我们的方法引入了一种新的遗忘损失函数，该函数促进分类不确定性而非错误分类。自适应步长机制增强了稳定性，同时考虑到剩余数据对模型的先验影响，采用非对称批次设计对遗忘数据与保留数据进行差异化处理，从而有效降低计算成本。我们通过理论分析证明FedORA与从头训练模型之间的差异存在上界，为遗忘有效性提供了理论保证。在表格数据与图像数据集上的实验表明，FedORA在显著降低计算与通信开销的同时，其遗忘效果与效用保持能力可与重新训练方法相媲美。