Leveraging human preferences for steering the behavior of Large Language Models (LLMs) has demonstrated notable success in recent years. Nonetheless, data selection and labeling are still a bottleneck for these systems, particularly at large scale. Hence, selecting the most informative points for acquiring human feedback may considerably reduce the cost of preference labeling and unleash the further development of LLMs. Bayesian Active Learning provides a principled framework for addressing this challenge and has demonstrated remarkable success in diverse settings. However, previous attempts to employ it for Preference Modeling did not meet such expectations. In this work, we identify that naive epistemic uncertainty estimation leads to the acquisition of redundant samples. We address this by proposing the Bayesian Active Learner for Preference Modeling (BAL-PM), a novel stochastic acquisition policy that not only targets points of high epistemic uncertainty according to the preference model but also seeks to maximize the entropy of the acquired prompt distribution in the feature space spanned by the employed LLM. Notably, our experiments demonstrate that BAL-PM requires 33% to 68% fewer preference labels in two popular human preference datasets and exceeds previous stochastic Bayesian acquisition policies.
翻译:近年来,利用人类偏好指导大语言模型行为已取得显著成效。然而,数据筛选与标注仍是此类系统的瓶颈,在大规模应用中尤为突出。因此,选择最具信息量的样本点获取人类反馈,可大幅降低偏好标注成本,并推动大语言模型的进一步发展。贝叶斯主动学习为此挑战提供了理论框架,并在多种场景中展现出卓越成效。但先前将其应用于偏好建模的尝试未能达到预期效果。本研究发现,简单的认知不确定性估计会导致冗余样本的采集。针对此问题,我们提出贝叶斯主动学习偏好建模器——一种新型随机采集策略,不仅根据偏好模型定位高认知不确定性样本点,同时致力于在所用大语言模型特征空间内最大化已采集提示分布的熵值。值得注意的是,实验表明在两种常用人类偏好数据集中,BAL-PM可减少33%至68%的偏好标注需求,其性能超越所有先前的随机贝叶斯采集策略。