Large language models (LLMs) have shown promising efficacy across various tasks, becoming powerful tools in numerous aspects of human life. However, Transformer-based LLMs suffer a performance degradation when modeling long-term contexts due to they discard some information to reduce computational overhead. In this work, we propose a simple yet effective method to enable LLMs to take a deep breath, encouraging them to summarize information contained within discrete text chunks. Specifically, we segment the text into multiple chunks and insert special token <SR> at the end of each chunk. We then modify the attention mask to integrate the chunk's information into the corresponding <SR> token. This facilitates LLMs to interpret information not only from historical individual tokens but also from the <SR> token, aggregating the chunk's semantic information. Experiments on language modeling and out-of-domain downstream tasks validate the superiority of our approach.
翻译:大型语言模型(LLMs)已在各类任务中展现出卓越效能,成为人类生活众多领域的强大工具。然而,基于Transformer的LLMs在建模长程上下文时存在性能退化问题,因为其为了降低计算开销会舍弃部分信息。本文提出一种简洁高效的方法,使LLMs能够"深吸一口气",促使其对离散文本块中包含的信息进行总结。具体而言,我们将文本分割为多个块,并在每个块末尾插入特殊标记<SR>。随后修改注意力掩码,使对应<SR>标记集成该块的信息。这有助于LLMs不仅从历史独立标记中解读信息,还能通过整合块语义信息的<SR>标记进行语义聚合。在语言建模和跨域下游任务上的实验验证了本方法的优越性。