Reinforcement learning from human feedback plays a crucial role in aligning language models towards human preferences, traditionally represented through comparisons between pairs or sets of responses within a given context. While many studies have enhanced algorithmic techniques to optimize learning from such data, this work shifts focus to improving preference learning through a data-centric approach. Specifically, we propose enriching existing preference datasets with machine-generated rationales that explain the reasons behind choices. We develop a simple and principled framework to augment current preference learning methods with rationale information. Our comprehensive analysis highlights how rationales enhance learning efficiency. Extensive experiments reveal that rationale-enriched preference learning offers multiple advantages: it improves data efficiency, accelerates convergence to higher-performing models, and reduces verbosity bias and hallucination. Furthermore, this framework is versatile enough to integrate with various preference optimization algorithms. Overall, our findings highlight the potential of re-imagining data design for preference learning, demonstrating that even freely available machine-generated rationales can significantly boost performance across multiple dimensions. The code repository is available at https: //github.com/reds-lab/preference-learning-with-rationales
翻译:基于人类反馈的强化学习在使语言模型与人类偏好对齐方面发挥着关键作用,传统上通过给定上下文中成对或成组响应之间的比较来表示偏好。虽然许多研究通过增强算法技术来优化从此类数据中学习,但本工作将重点转向通过数据中心化的方法来改进偏好学习。具体而言,我们提出用机器生成的推理来丰富现有的偏好数据集,这些推理解释了选择背后的原因。我们开发了一个简单且原则性的框架,将推理信息融入当前的偏好学习方法中。我们的综合分析强调了推理如何提高学习效率。大量实验表明,富含推理的偏好学习具有多重优势:它提高了数据效率,加速了收敛到更高性能模型的过程,并减少了冗余性偏差和幻觉。此外,该框架足够通用,可以与各种偏好优化算法集成。总体而言,我们的研究结果凸显了重新构想偏好学习数据设计的潜力,表明即使是免费可得的机器生成推理也能在多个维度上显著提升性能。代码仓库可在 https: //github.com/reds-lab/preference-learning-with-rationales 获取。