Machine unlearning for large language models often faces a privacy dilemma in which strict constraints prohibit sharing either the server's parameters or the client's forget set. To address this dual non-disclosure constraint, we propose MPU, an algorithm-agnostic privacy-preserving Multiple Perturbed Copies Unlearning framework that primarily introduces two server-side modules: Pre-Process for randomized copy generation and Post-Process for update aggregation. In Pre-Process, the server distributes multiple perturbed and reparameterized model instances, allowing the client to execute unlearning locally on its private forget set without accessing the server's exact original parameters. After local unlearning, the server performs Post-Process by inverting the reparameterization and aggregating updates with a harmonic denoising procedure to alleviate the impact of perturbation. Experiments with seven unlearning algorithms show that MPU achieves comparable unlearning performance to noise-free baselines, with most algorithms' average degradation well below 1% under 10% noise, and can even outperform the noise-free baseline for some algorithms under 1% noise. Code is available at https://github.com/Tristan-SHU/MPU.
翻译:大语言模型的机器遗忘常面临隐私困境:严格约束禁止共享服务器参数或客户端遗忘集。针对这一双重非披露约束,我们提出MPU——一种算法无关的隐私保护多扰动副本遗忘框架,其核心引入两个服务器端模块:用于随机化副本生成的预处理模块和用于更新聚合的后处理模块。在预处理阶段,服务器分发多个经过扰动和重参数化的模型实例,使客户端能够在本地对其私有遗忘集执行遗忘操作,而无需访问服务器的精确原始参数。本地遗忘完成后,服务器执行后处理:通过逆转重参数化并采用谐波去噪程序聚合更新,以减轻扰动影响。使用七种遗忘算法的实验表明,MPU在10%噪声水平下能达到与无噪声基线相当的遗忘性能,多数算法的平均性能下降幅度低于1%;在1%噪声水平下,部分算法甚至能超越无噪声基线。代码发布于https://github.com/Tristan-SHU/MPU。