Diverse language model responses are crucial for creative generation, open-ended tasks, and self-improvement training. We show that common diversity metrics, and even reward models used for preference optimization, systematically bias models toward shorter outputs, limiting expressiveness. To address this, we introduce Diverse, not Short (Diverse-NS), a length-controlled data selection strategy that improves response diversity while maintaining length parity. By generating and filtering preference data that balances diversity, quality, and length, Diverse-NS enables effective training using only 3,000 preference pairs. Applied to LLaMA-3.1-8B and the Olmo-2 family, Diverse-NS substantially enhances lexical and semantic diversity. We show consistent improvement in diversity with minor reduction or gains in response quality on four creative generation tasks: Divergent Associations, Persona Generation, Alternate Uses, and Creative Writing. Surprisingly, experiments with the Olmo-2 model family (7B, and 13B) show that smaller models like Olmo-2-7B can serve as effective "diversity teachers" for larger models. By explicitly addressing length bias, our method efficiently pushes models toward more diverse and expressive outputs.
翻译:语言模型响应的多样性对于创意生成、开放式任务以及自我改进训练至关重要。我们发现,常用的多样性指标,甚至用于偏好优化的奖励模型,都会系统性地偏向于较短的输出,从而限制了表达力。为解决这一问题,我们提出了“多样而非简短”(Diverse-NS),这是一种长度控制的数据选择策略,能在保持长度相当的同时提升响应多样性。通过生成并筛选平衡了多样性、质量和长度的偏好数据,Diverse-NS 仅需 3,000 个偏好对即可实现有效训练。应用于 LLaMA-3.1-8B 和 Olmo-2 系列模型时,Diverse-NS 显著提升了词汇和语义多样性。我们在四个创意生成任务(发散联想、角色生成、替代用途和创意写作)上观察到,响应多样性得到了一致的提升,而响应质量仅有轻微下降或有所增益。令人惊讶的是,在 Olmo-2 模型系列(7B 和 13B)上的实验表明,像 Olmo-2-7B 这样较小的模型可以作为较大模型的有效“多样性教师”。通过明确处理长度偏差,我们的方法有效地推动模型产生更加多样化和富有表现力的输出。