Recently, large language models (LLMs) like ChatGPT and GPT-4 have attracted great attention given their surprising improvement and performance. Length controlled generation of LLMs emerges as an important topic, which also enables users to fully leverage the capability of LLMs in more real-world scenarios like generating a proper answer or essay of a desired length. In addition, the autoregressive generation in LLMs is extremely time-consuming, while the ability of controlling this generated length can arbitrarily reduce the inference cost by limiting the length, and thus satisfy different needs. Therefore, we aim to propose a prompt-based length control method to achieve this length controlled generation, which can also be widely applied in GPT-style LLMs. In particular, we adopt reinforcement learning with the reward signal given by either trainable or rule-based reward model, which further affects the generation of LLMs via rewarding a pre-defined target length. Experiments show that our method significantly improves the accuracy of prompt-based length control for summarization task on popular datasets like CNNDM and NYT. We believe this length-controllable ability can provide more potentials towards the era of LLMs.
翻译:近期,像ChatGPT和GPT-4这样的大型语言模型(LLMs)因其显著的性能提升而备受关注。长度可控生成成为LLMs的一个重要研究方向,它使用户能够在更多实际场景中充分利用LLMs的能力,例如生成合适长度且符合需求的答案或文章。此外,LLMs的自回归生成过程极为耗时,而长度控制能力可以通过限制生成长度任意降低推理成本,从而满足不同需求。因此,我们提出一种基于提示的长度控制方法来实现长度可控生成,该方法可广泛适用于GPT类LLMs。具体而言,我们采用强化学习方法,通过可训练或基于规则的奖励模型提供奖励信号,并基于预设的目标长度对LLMs的生成过程进行激励。实验表明,该方法在CNNDM和NYT等主流数据集上的摘要任务中,显著提升了基于提示的长度控制准确性。我们相信,这种长度可控能力将为LLMs时代带来更多可能性。