Large language models (LLMs) have demonstrated impressive instruction following capabilities, while still struggling to accurately manage the length of the generated text, which is a fundamental requirement in many real-world applications. Existing length control methods involve fine-tuning the parameters of LLMs, which is inefficient and suboptimal for practical use. In this paper, we propose a novel iterative sampling framework for text length control, integrating the Metropolis-Hastings algorithm with an importance sampling acceleration strategy. This framework efficiently and reliably regulates LLMs to generate length-constrained text without modifying the underlying parameters, thereby preserving the original capabilities of LLMs. Experimental results demonstrate that our framework achieves almost 100\% success rates of length control on Llama3.1 for tasks such as length-controlled abstractive summarization and length-constrained instruction following, with minimal additional computational overhead. This also highlights the significant potential of our method for precise length control across a broader range of applications, without compromising the versatility of LLMs.
翻译:大语言模型(LLMs)已展现出卓越的指令跟随能力,但在准确控制生成文本长度方面仍存在困难,而这正是许多实际应用中的基本需求。现有的长度控制方法通常涉及对大语言模型参数进行微调,这种方法效率低下且在实际应用中并非最优。本文提出了一种新颖的迭代采样框架用于文本长度控制,该框架将Metropolis-Hastings算法与重要性采样加速策略相结合。该框架能够高效可靠地调控大语言模型生成长度受限的文本,而无需修改其底层参数,从而完整保留大语言模型的原始能力。实验结果表明,在Llama3.1模型上,我们的框架在长度可控抽象摘要和长度受限指令跟随等任务中实现了接近100%的长度控制成功率,且仅需极少的额外计算开销。这也凸显了我们的方法在更广泛应用中实现精确长度控制的巨大潜力,同时不会损害大语言模型的多功能性。