Large language models (LLMs) like ChatGPT and GPT-4 have attracted great attention given their surprising performance on a wide range of NLP tasks. Length controlled generation of LLMs emerges as an important topic, which enables users to fully leverage the capability of LLMs in more real-world scenarios like generating a proper answer or essay of a desired length. In addition, the autoregressive generation in LLMs is extremely time-consuming, while the ability of controlling this generated length can reduce the inference cost by limiting the length. Therefore, we propose a prompt-based length control method to achieve high-accuracy length controlled generation. In particular, we adopt reinforcement learning with the reward signal given by either trainable or rule-based reward models, which further enhances the length-control ability of LLMs by rewarding outputs that follows pre-defined control instruction. To enable rule-based inference, we also introduce standard prompt extractor to collect the standard control information from users' input. Experiments show that our method significantly improves the accuracy of prompt-based length control for summarization task on popular datasets like CNNDM and NYT. Both the standard prompt extractor and the RL-tuned model have show strong generalization ability to unseen control prompt templates.
翻译:大型语言模型,如ChatGPT和GPT-4,在广泛自然语言处理任务中表现出的惊人性能引起了广泛关注。语言模型生成过程中的长度控制问题逐渐成为重要课题,其使用户能够在更真实的应用场景中充分利用语言模型的能力,例如生成适当长度或期望长度的答案或文章。此外,语言模型的自回归生成过程极为耗时,而控制生成长度的能力可通过限制长度来降低推理成本。为此,我们提出了一种基于提示的长度控制方法,以实现高精度的长度可控生成。具体而言,我们采用强化学习,通过可训练或基于规则的奖励模型提供奖励信号,对遵循预定义控制指令的输出进行奖励,从而进一步增强语言模型的长度控制能力。为支持基于规则的推理,我们还引入了标准提示提取器,从用户输入中收集标准控制信息。实验表明,在CNNDM和NYT等流行数据集上,我们的方法显著提升了基于提示的长度控制在摘要任务中的准确性。标准提示提取器和强化学习调优后的模型均展现出对未见控制提示模板的强泛化能力。