Language models deployed in the wild make errors. However, simply updating the model with the corrected error instances causes catastrophic forgetting -- the updated model makes errors on instances learned during the instruction tuning or upstream training phase. Randomly replaying upstream data yields unsatisfactory performance and often comes with high variance and poor controllability. To this end, we try to forecast upstream examples that will be forgotten due to a model update for improved controllability of the replay process and interpretability. We train forecasting models given a collection of online learned examples and corresponding forgotten upstream pre-training examples. We propose a partially interpretable forecasting model based on the observation that changes in pre-softmax logit scores of pretraining examples resemble that of online learned examples, which performs decently on BART but fails on T5 models. We further show a black-box classifier based on inner products of example representations achieves better forecasting performance over a series of setups. Finally, we show that we reduce forgetting of upstream pretraining examples by replaying examples that are forecasted to be forgotten, demonstrating the practical utility of forecasting example forgetting.
翻译:部署在真实环境中的语言模型会产生错误。然而,仅使用修正后的错误实例更新模型会导致灾难性遗忘——更新后的模型会对在指令微调或上游训练阶段已习得的实例产生错误。随机重放上游数据性能不佳,且通常伴随高方差和较差的控制性。为此,我们尝试预测因模型更新而将被遗忘的上游样本,以提升重放过程的控制性和可解释性。给定一组在线学习样本及其对应的被遗忘上游预训练样本,我们训练预测模型。基于预训练样本的softmax前对数分数变化与在线学习样本相似的观察,我们提出了一种部分可解释的预测模型,该模型在BART上表现良好,但在T5模型上失效。我们进一步证明,基于样本表示内积的黑盒分类器在一系列实验设置中取得了更好的预测性能。最后,我们通过重放被预测为将遗忘的样本,减少了对上游预训练样本的遗忘,从而证明了预测样本遗忘的实际效用。