Despite the great success of large language models (LLMs), efficiently controlling the length of the output sequence still remains a challenge. In this paper, we propose Hansel, an efficient framework for length control in LLMs without affecting its generation ability. Hansel utilizes periodically outputted hidden special tokens to keep track of the remaining target length of the output sequence. Together with techniques to avoid abrupt termination of the output, this seemingly simple method proved to be efficient and versatile, while not harming the coherency and fluency of the generated text. The framework can be applied to any pre-trained LLMs during the finetuning stage of the model, regardless of its original positional encoding method. We demonstrate this by finetuning four different LLMs with Hansel and show that the mean absolute error of the output sequence decreases significantly in every model and dataset compared to the prompt-based length control finetuning. Moreover, the framework showed a substantially improved ability to extrapolate to target lengths unseen during finetuning, such as long dialog responses or extremely short summaries. This indicates that the model learns the general means of length control, rather than learning to match output lengths to those seen during training.
翻译:尽管大语言模型取得了巨大成功,高效控制输出序列长度仍是一个挑战。本文提出Hansel——一种不影响生成能力的LLM长度控制高效框架。Hansel利用周期性输出的隐藏特殊标记来追踪输出序列的剩余目标长度。结合避免输出突然终止的技术,这种看似简单的方法被证明是高效且通用的,同时不会损害生成文本的连贯性与流畅性。该框架可在模型微调阶段应用于任何预训练LLM,且不受其原始位置编码方法限制。我们通过对四种不同LLM进行Hansel微调验证了这一点,结果表明:相较于基于提示的长度控制微调,所有模型和数据集的输出序列平均绝对误差均显著下降。此外,该框架展现出对微调期间未见目标长度(如长对话响应或极短摘要)的强泛化能力。这表明模型学习的是长度控制的通用机制,而非简单匹配训练时见过的输出长度。