Machine unlearning for large language models often faces a privacy dilemma in which strict constraints prohibit sharing either the server's parameters or the client's forget set. To address this dual non-disclosure constraint, we propose MPU, an algorithm-agnostic privacy-preserving Multiple Perturbed Copies Unlearning framework that primarily introduces two server-side modules: Pre-Process for randomized copy generation and Post-Process for update aggregation. In Pre-Process, the server distributes multiple perturbed and reparameterized model instances, allowing the client to execute unlearning locally on its private forget set without accessing the server's exact original parameters. After local unlearning, the server performs Post-Process by inverting the reparameterization and aggregating updates with a harmonic denoising procedure to alleviate the impact of perturbation. Experiments with seven unlearning algorithms show that MPU achieves comparable unlearning performance to noise-free baselines, with most algorithms' average degradation well below 1% up to 10% noise, and can even outperform the noise-free baseline for some algorithms under 1% noise. Code is available at https://github.com/Tristan0318/MPU.
翻译:大型语言模型的机器遗忘常面临隐私困境:严格的约束条件禁止共享服务器参数或客户端的遗忘集。为应对这种双重非披露约束,我们提出MPU——一种算法无关的隐私保护多扰动副本遗忘框架,主要引入两个服务端模块:用于随机副本生成的预处理模块和用于更新聚合的后处理模块。在预处理阶段,服务端分发多个经过扰动和重参数化的模型实例,使客户端能够在私有遗忘集上本地执行遗忘操作,而无需访问服务端精确的原始参数。本地遗忘完成后,服务端通过反转重参数化并采用谐波去噪过程聚合更新,以缓解扰动影响。针对七种遗忘算法的实验表明,MPU能达到与无噪声基准相当的遗忘性能——在噪声高达10%时,大多数算法的平均性能退化低于1%;在1%噪声条件下,部分算法甚至能超越无噪声基准。代码已开源至https://github.com/Tristan0318/MPU。