Machine unlearning aims to unlearn specified training data (e.g. sensitive or copyrighted material). A prominent approach is to fine-tune an existing model with an unlearning loss that retains overall utility. The space of suitable unlearning loss functions is vast, making the search for an optimal loss function daunting. Additionally, there might not even exist a universally optimal loss function: differences in the structure and overlap of the forget and retain data can cause a loss to work well in one setting but over-unlearn or under-unlearn in another. Our approach EvoMU tackles these two challenges simultaneously. An evolutionary search procedure automatically finds task-specific losses in the vast space of possible unlearning loss functions. This allows us to find dataset-specific losses that match or outperform existing losses from the literature, without the need for a human-in-the-loop. This work is therefore an instance of automatic scientific discovery, a.k.a. an AI co-scientist. In contrast to previous AI co-scientist works, we do so on a budget: We achieve SotA results using a small 4B parameter model (Qwen3-4B-Thinking), showing the potential of AI co-scientists with limited computational resources. Our experimental evaluation shows that we surpass previous loss-based unlearning formulations on TOFU-5%, TOFU-10%, MUSE and WMDP by synthesizing novel unlearning losses. Our code is available at https://github.com/Batorskq/EvoMU.
翻译:机器遗忘旨在使模型遗忘特定的训练数据(例如敏感或受版权保护的材料)。一种主流方法是通过设计遗忘损失函数对现有模型进行微调,以保持模型的整体效用。然而,合适的遗忘损失函数空间极为庞大,寻找最优损失函数极具挑战性。此外,可能并不存在普遍最优的损失函数:遗忘数据与保留数据在结构及重叠性上的差异可能导致某一损失函数在特定场景下表现良好,却在其他场景中出现过度遗忘或遗忘不足的问题。本文提出的EvoMU方法同时应对这两大挑战。我们采用进化搜索算法,在广阔的遗忘损失函数空间中自动寻找针对特定任务的损失函数。这使得我们能够发现优于或媲美现有文献中损失函数的数据集专用损失,而无需人工干预。因此,本研究是自动科学发现(即AI协科学家)的一个实例。与先前AI协科学家研究不同,我们在有限计算资源下实现了这一目标:使用仅40亿参数的小型模型(Qwen3-4B-Thinking)取得了最先进的成果,展现了有限算力下AI协科学家的潜力。实验评估表明,通过合成新型遗忘损失函数,我们在TOFU-5%、TOFU-10%、MUSE和WMDP数据集上超越了以往基于损失的遗忘方法。代码已开源:https://github.com/Batorskq/EvoMU。