Alignment with human preference prevents large language models (LLMs) from generating misleading or toxic content while requiring high-cost human feedback. Assuming resources of human annotation are limited, there are two different ways of allocating considered: more diverse PROMPTS or more diverse RESPONSES to be labeled. Nonetheless, a straightforward comparison between their impact is absent. In this work, we first control the diversity of both sides according to the number of samples for fine-tuning, which can directly reflect their influence. We find that instead of numerous prompts, more responses but fewer prompts better trigger LLMs for human alignment. Additionally, the concept of diversity for prompts can be more complex than responses that are typically quantified by single digits. Consequently, a new formulation of prompt diversity is proposed, further implying a linear correlation with the final performance of LLMs after fine-tuning. We also leverage it on data augmentation and conduct experiments to show its effect on different algorithms.
翻译:与人类偏好对齐可防止大语言模型(LLMs)生成误导性或有害内容,但同时需要高昂的人工反馈成本。在人工标注资源有限的假设下,存在两种不同的分配方式:标注更多样化的提示(PROMPTS)或更多样化的响应(RESPONSES)。然而,目前尚缺乏对两者影响的直接比较。本研究首先通过控制微调样本数量来调节两方多样性,以直接反映其影响。我们发现,相比于大量提示,采用更少提示但更多响应能更好地激发LLMs实现人类对齐。此外,提示多样性的概念比通常仅由单一数字量化的响应多样性更为复杂。为此,我们提出了一种新的提示多样性表述,并进一步揭示其与微调后LLMs最终性能的线性相关性。我们还将该方法应用于数据增强,并通过实验展示其在不同算法中的效果。