Despite the huge progress in myriad generation tasks, pretrained language models (LMs) such as GPT2 still tend to generate repetitive texts with maximization-based decoding algorithms for open-ended generation. We attribute their overestimation of token-level repetition probabilities to the learning bias: LMs capture simple repetitive patterns faster with the MLE loss. We propose self-contrastive training to penalize the output of a premature checkpoint of the same model when it incorrectly predicts repetition, which is shown to mitigate repetition effectively while maintaining fluency on two datasets. Furthermore, we find that LMs use longer-range dependencies to predict repetitive tokens than non-repetitive ones, which may be the cause of sentence-level repetition loops.
翻译:尽管在多种生成任务中取得了巨大进展,预训练语言模型(如GPT2)在使用基于最大化解码算法进行开放式生成时,仍倾向于生成重复文本。我们将这种对词级重复概率的高估归因于学习偏差:语言模型通过最大似然估计损失更容易捕捉简单重复模式。我们提出自对比训练,当同一模型的早期检查点错误预测重复时对其进行惩罚,实验表明该方法能在保持流畅性的同时有效缓解两个数据集上的重复问题。此外,我们发现语言模型预测重复标记时比非重复标记使用更长距离的依赖关系,这可能是导致句子级重复循环的原因。