Standard Transformer-based language models (LMs) scale poorly to long contexts. We propose a solution based on dynamic contextual compression, which extends the Nugget approach of Qin & Van Durme (2023) from BERT-like frameworks to decoder-only LMs. Our method models history as compressed "nuggets" which are trained to allow for reconstruction, and it can be initialized with off-the-shelf models such as LLaMA. We demonstrate through experiments in language modeling, question answering, and summarization that Nugget2D retains capabilities in these tasks, while drastically reducing the overhead during decoding in terms of time and space. For example, in the experiments of autoencoding, Nugget2D can shrink context at a 20x compression ratio with a BLEU score of 98% for reconstruction, achieving nearly lossless encoding.
翻译:基于标准Transformer的语言模型在处理长上下文时扩展性较差。我们提出了一种基于动态上下文压缩的解决方案,将Qin与Van Durme(2023)的Nugget方法从BERT类框架扩展至仅解码器语言模型。该方法将历史信息建模为可训练的压缩"金块",支持通过重构进行训练,并可利用LLaMA等现成模型进行初始化。通过在语言建模、问答和摘要任务上的实验证明,Nugget2D在维持这些任务能力的同时,显著降低了解码阶段的时间与空间开销。例如在自编码实验中,Nugget2D能以20倍压缩比实现98% BLEU值的重构,达到近乎无损的编码效果。