A Quantitative Characterization of Forgetting in Post-Training

Continual post-training of generative models is widely used, yet a principled understanding of when and why forgetting occurs remains limited. We develop theoretical results under a two-mode mixture abstraction (representing old and new tasks), proposed by Chen et al. (2025) (arXiv:2510.18874), and formalize forgetting in two forms: (i) mass forgetting, where the old mixture weight collapses to zero, and (ii) old-component drift, where an already-correct old component shifts during training. For equal-covariance Gaussian modes, we prove that forward-KL objectives trained on data from the new distribution drive the old weight to zero, while reverse-KL objectives converge to the true target (thereby avoiding mass forgetting) and perturb the old mean only through overlap-gated misassignment probabilities controlled by the Bhattacharyya coefficient, yielding drift that decays exponentially with mode separation and a locally well-conditioned geometry with exponential convergence. We further quantify how replay interacts with these objectives. For forward-KL, replay must modify the training distribution to change the population optimum; for reverse-KL, replay leaves the population objective unchanged but prevents finite-batch old-mode starvation through bounded importance weighting. Finally, we analyze three recently proposed near-on-policy post-training methods, SDFT (arxiv:2601.19897), TTT-Discover (arxiv:2601.16175), and OAPL (arxiv:2602.19362), via the same lens and derive explicit conditions under which each retains old mass and exhibits overlap-controlled drift. Overall, our results show that forgetting can by precisely quantified based on the interaction between divergence direction, geometric behavioral overlap, sampling regime, and the visibility of past behavior during training.

翻译：生成模型的持续后训练被广泛使用，然而对于遗忘何时发生及其原因的机理理解仍然有限。我们在Chen等人（2025）（arXiv:2510.18874）提出的双模态混合抽象（代表旧任务和新任务）下发展理论结果，并将遗忘形式化为两种形式：（i）质量遗忘，即旧混合权重坍缩为零；（ii）旧成分漂移，即已正确的旧成分在训练期间发生偏移。对于等协方差的高斯模态，我们证明：在新分布数据上训练的前向KL目标会驱使旧权重趋于零，而反向KL目标则收敛到真实目标（从而避免质量遗忘），并且仅通过由Bhattacharyya系数控制的重叠门限误分配概率扰动旧均值，从而产生随模态分离度指数衰减的漂移，并具有局部良态的几何结构和指数收敛性。我们进一步量化了回放机制与这些目标的相互作用。对于前向KL，回放必须修改训练分布以改变总体最优解；对于反向KL，回放保持总体目标不变，但通过有界重要性加权防止有限批次下的旧模态饥饿。最后，我们通过相同视角分析了三种近期提出的近在线策略后训练方法——SDFT（arxiv:2601.19897）、TTT-Discover（arxiv:2601.16175）和OAPL（arxiv:2602.19362），并推导出每种方法保留旧质量及呈现重叠控制漂移的显式条件。总体而言，我们的结果表明：遗忘可以根据散度方向、几何行为重叠、采样机制以及训练过程中过去行为的可见性之间的相互作用进行精确量化。