The instruction-following ability of large language models enables humans to interact with AI agents in a natural way. However, when required to generate responses of a specific length, large language models often struggle to meet users' needs due to their inherent difficulty in accurately perceiving numerical constraints. To explore the ability of large language models to control the length of generated responses, we propose the Target Length Generation Task (TLG) and design two metrics, Precise Match (PM) and Flexible Match (FM) to evaluate the model's performance in adhering to specified response lengths. Furthermore, we introduce a novel, model-agnostic approach called Ruler, which employs Meta Length Tokens (MLTs) to enhance the instruction-following ability of large language models under length-constrained instructions. Specifically, Ruler equips LLMs with the ability to generate responses of a specified length based on length constraints within the instructions. Moreover, Ruler can automatically generate appropriate MLT when length constraints are not explicitly provided, demonstrating excellent versatility and generalization. Comprehensive experiments show the effectiveness of Ruler across different LLMs on Target Length Generation Task, e.g., at All Level 27.97 average gain on PM, 29.57 average gain on FM. In addition, we conduct extensive ablation experiments to further substantiate the efficacy and generalization of Ruler. Our code and data is available at https://github.com/Geaming2002/Ruler.
翻译:大语言模型的指令跟随能力使人类能够以自然方式与AI智能体交互。然而,当需要生成特定长度的响应时,大语言模型由于固有难以准确感知数值约束,往往难以满足用户需求。为探究大语言模型控制生成响应长度的能力,我们提出目标长度生成任务(TLG),并设计精确匹配(PM)与灵活匹配(FM)两项指标以评估模型遵循指定响应长度的性能。进一步,我们提出一种新颖的模型无关方法Ruler,该方法通过元长度令牌(MLTs)增强大语言模型在长度约束指令下的指令跟随能力。具体而言,Ruler使大语言模型能够根据指令中的长度约束生成指定长度的响应。此外,当未明确提供长度约束时,Ruler可自动生成合适的MLT,展现出优异的泛化能力与通用性。综合实验表明Ruler在不同大语言模型上对目标长度生成任务均具有显著效果,例如在PM指标上平均提升27.97,在FM指标上平均提升29.57。我们还进行了广泛的消融实验以进一步验证Ruler的有效性与泛化性。代码与数据已开源:https://github.com/Geaming2002/Ruler。