Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. Existing long-context extension methods usually need additional training procedures to support corresponding long-context windows, where the long-context training data (e.g., 32k) is needed, and high GPU training costs are assumed. To address the aforementioned issues, we propose an Efficient and Extreme length extension method for Large Language Models, called E 2 -LLM, with only one training procedure and dramatically reduced computation cost, which also removes the need to collect long-context data. Concretely, first, the training data of our E 2 -LLM only requires a short length (e.g., 4k), which reduces the tuning cost greatly. Second, the training procedure on the short training context window is performed only once time, and we can support different evaluation context windows at inference. Third, in E 2 - LLM, based on RoPE position embeddings, we introduce two different augmentation methods on the scale and position index parameters for different samples in training. It aims to make the model more robust to the different relative differences when directly interpolating the arbitrary context length at inference. Comprehensive experimental results on multiple benchmark datasets demonstrate the effectiveness of our E 2 -LLM on challenging long-context tasks.
翻译:通常,训练长上下文长度的大型语言模型在计算上成本高昂,需要大量的训练时间和GPU资源。现有的长上下文扩展方法通常需要额外的训练过程来支持相应的长上下文窗口,其中需要长上下文训练数据(例如32k),并假定有高昂的GPU训练成本。为解决上述问题,我们提出了一种针对大型语言模型的高效与极端长度扩展方法,称为E²-LLM,该方法仅需一次训练过程,计算成本大幅降低,同时无需收集长上下文数据。具体而言,首先,E²-LLM的训练数据仅需短长度(例如4k),从而大幅降低调优成本。其次,在短训练上下文窗口上的训练过程仅执行一次,且我们可在推理时支持不同的评估上下文窗口。再次,在E²-LLM中,基于RoPE位置嵌入,我们针对训练中的不同样本引入了两种关于尺度与位置索引参数的增强方法。其目的是使模型在推理时对任意上下文长度的直接插值具有更强的鲁棒性,以应对不同的相对差异。在多个基准数据集上的综合实验结果证明了我们的E²-LLM在具有挑战性的长上下文任务中的有效性。