Effectively handling instructions with extremely long context remains a challenge for Large Language Models (LLMs), typically necessitating high-quality long data and substantial computational resources. This paper introduces Step-Skipping Alignment (SkipAlign), a new technique designed to enhance the long-context capabilities of LLMs in the phase of alignment without the need for additional efforts beyond training with original data length. SkipAlign is developed on the premise that long-range dependencies are fundamental to enhancing an LLM's capacity of long context. Departing from merely expanding the length of input samples, SkipAlign synthesizes long-range dependencies from the aspect of positions indices. This is achieved by the strategic insertion of skipped positions within instruction-following samples, which utilizes the semantic structure of the data to effectively expand the context. Through extensive experiments on base models with a variety of context window sizes, SkipAlign demonstrates its effectiveness across a spectrum of long-context tasks. Particularly noteworthy is that with a careful selection of the base model and alignment datasets, SkipAlign with only 6B parameters achieves it's best performance and comparable with strong baselines like GPT-3.5-Turbo-16K on LongBench.
翻译:高效处理超长上下文指令对大型语言模型(LLMs)仍是一项挑战,通常需要高质量的长文本数据和大量的计算资源。本文提出了一种新技术——跳跃对齐(SkipAlign),旨在不需要额外训练原始数据长度的情况下,在模型对齐阶段增强LLMs的长上下文能力。SkipAlign基于一个核心前提:长距离依赖是提升LLMs长上下文能力的基础。与单纯扩展输入样本长度不同,SkipAlign从位置索引角度合成长距离依赖关系。通过在遵循指令的样本中策略性地插入跳跃位置,该方法利用数据的语义结构有效扩展上下文。基于不同上下文窗口大小的基础模型进行的大量实验表明,SkipAlign在各类长上下文任务中展现出显著效果。特别值得注意的是,通过精心选择基础模型和对齐数据集,仅含60亿参数的SkipAlign在LongBench基准上达到最优性能,并与GPT-3.5-Turbo-16K等强基线模型表现相当。