Recent advancements in Large Language Models (LLMs) have significantly enhanced their ability to process long contexts, yet a notable gap remains in generating long, aligned outputs. This limitation stems from a training gap where pre-training lacks effective instructions for long-text generation, and post-training data primarily consists of short query-response pairs. Current approaches, such as instruction backtranslation and behavior imitation, face challenges including data quality, copyright issues, and constraints on proprietary model usage. In this paper, we introduce an innovative iterative training framework called Self-Lengthen that leverages only the intrinsic knowledge and skills of LLMs without the need for auxiliary data or proprietary models. The framework consists of two roles: the Generator and the Extender. The Generator produces the initial response, which is then split and expanded by the Extender. This process results in a new, longer response, which is used to train both the Generator and the Extender iteratively. Through this process, the models are progressively trained to handle increasingly longer responses. Experiments on benchmarks and human evaluations show that Self-Lengthen outperforms existing methods in long-text generation, when applied to top open-source LLMs such as Qwen2 and LLaMA3. Our code is publicly available at https://github.com/QwenLM/Self-Lengthen.
翻译:近年来,大型语言模型在处理长上下文方面的能力显著提升,但在生成长篇对齐输出方面仍存在明显差距。这一局限源于训练过程中的缺口:预训练阶段缺乏针对长文本生成的有效指令,而后训练数据主要由短查询-响应对组成。现有方法如指令反向翻译和行为模仿面临着数据质量、版权问题以及专有模型使用限制等挑战。本文提出一种创新的迭代训练框架Self-Lengthen,该框架仅利用LLMs的内在知识与技能,无需辅助数据或专有模型。该框架包含生成器与扩展器两个角色:生成器产生初始响应,随后由扩展器对其进行分割与扩展。这一过程产生新的更长响应,并用于迭代训练生成器与扩展器。通过此机制,模型被逐步训练以处理日益增长的响应长度。在基准测试和人工评估中的实验表明,当应用于Qwen2和LLaMA3等顶尖开源LLMs时,Self-Lengthen在长文本生成任务上优于现有方法。我们的代码已在https://github.com/QwenLM/Self-Lengthen公开。