Language models (LMs) can generate hallucinations and incoherent outputs, which highlights their weak context dependency. Cache-LMs, which augment LMs with a memory of recent history, can increase context dependency and have shown remarkable performance in diverse language generation tasks. However, we find that even with training, the performance gain stemming from the cache component of current cache-LMs is suboptimal due to the misalignment between the current hidden states and those stored in the memory. In this work, we present HistAlign, a new training approach to ensure good cache alignment such that the model receives useful signals from the history. We first prove our concept on a simple and synthetic task where the memory is essential for correct predictions, and we show that the cache component of HistAlign is better aligned and improves overall performance. Next, we evaluate HistAlign on diverse downstream language generation tasks, including prompt continuation, abstractive summarization, and data-to-text. We demonstrate that HistAlign improves text coherence and faithfulness in open-ended and conditional generation settings respectively. HistAlign is also generalizable across different model families, showcasing its strength in improving context dependency of LMs in diverse scenarios. Our code is publicly available at https://github.com/meetdavidwan/histalign
翻译:语言模型(LMs)在生成过程中会出现幻觉和不连贯输出,这突显了其上下文依赖性的薄弱。带有近期历史记忆的缓存语言模型(Cache-LMs)能够增强上下文依赖性,在多种语言生成任务中表现出色。然而,我们发现,即使在训练后,由于当前隐状态与记忆存储状态之间的对齐不足,现有缓存语言模型中缓存组件带来的性能增益仍非最优。为此,我们提出HistAlign——一种确保缓存对齐的新训练方法,使模型能从历史信息中获取有效信号。我们首先在依赖记忆进行正确预测的简单合成任务上验证了概念,结果表明HistAlign的缓存组件对齐效果更优并提升了整体性能。随后,我们在多样化的下游语言生成任务(包括提示续写、抽象式摘要和数据到文本生成)中评估了HistAlign。实验证明,在开放式与条件式生成场景下,HistAlign分别提升了文本连贯性与忠实度。HistAlign在不同模型家族中均具有泛化能力,展现了其在多种场景下增强语言模型上下文依赖性的优势。我们的代码已开源:https://github.com/meetdavidwan/histalign