LLM unlearning has emerged as a cost-effective alternative to full retraining for removing hazardous knowledge from pretrained models while preserving general utility. Recent RL-based methods such as RULE reformulate unlearning as learning a refusal behavior, but their on-policy optimization repeatedly samples from the same forget and retain/boundary prompts throughout training. We identify a critical inefficiency in this process: easy cases quickly converge and provide little useful gradient signal, while hard cases near the forget/retain boundary continue to produce low-reward rollouts that are discarded after a single use. To address this issue, we propose ReRULE, an off-policy replay enhancement for reinforcement unlearning. ReRULE stores low-reward hard-case rollout groups in a replay buffer during early GRPO training and reuses them in later stages through importance-sampled off-policy updates, redirecting computation toward boundary cases that still require learning. Theoretically, we show that ReRULE yields a tighter hard-case convergence bound than pure on-policy RULE. Empirically, ReRULE improves MUSE-Books Retain Quality from 46.3 to 56.2 while adding only 5--11% training time across benchmarks. Its limited improvement on the simpler TOFU setting further supports the intended conditional behavior: replay is most beneficial when the hard/easy disparity is pronounced.
翻译:大语言模型遗忘作为一种经济高效的替代方案,可通过移除预训练模型中的有害知识同时保持通用能力,避免完全重新训练的成本。近年来基于强化学习的方法(如RULE)将遗忘任务重构为学习拒绝行为,但其在策略优化过程中始终从相同的遗忘/保留边界提示中重复采样。我们识别出该过程的关键效率缺陷:简单案例快速收敛后几乎不提供有效梯度信号,而处于遗忘/保留边界附近的困难案例虽持续产生低奖励样本,但这些样本在单次使用后即被丢弃。针对此问题,我们提出ReRULE——面向强化遗忘的离策略重放增强方法。该方法在GRPO训练早期将低奖励困难案例样本组存入重放缓冲区,并通过重要性采样离策略更新在后阶段复用,将计算资源重新导向仍需学习的边界案例。理论上证明ReRULE比纯在策略RULE具有更紧的困难案例收敛界。实验表明,ReRULE将MUSE-Books保留质量从46.3提升至56.2,且在各基准测试中仅增加5-11%训练时间。在较简单的TOFU设置中改进幅度有限,进一步验证了其条件性行为机制:当难易样本差异显著时重放效益最大。