The advent of large language models (LLMs) has sparked significant interest in using natural language for preference learning. However, existing methods often suffer from high computational burdens, taxing human supervision, and lack of interpretability. To address these issues, we introduce MAPLE, a framework for large language model-guided Bayesian active preference learning. MAPLE leverages LLMs to model the distribution over preference functions, conditioning it on both natural language feedback and conventional preference learning feedback, such as pairwise trajectory rankings. MAPLE also employs active learning to systematically reduce uncertainty in this distribution and incorporates a language-conditioned active query selection mechanism to identify informative and easy-to-answer queries, thus reducing human burden. We evaluate MAPLE's sample efficiency and preference inference quality across two benchmarks, including a real-world vehicle route planning benchmark using OpenStreetMap data. Our results demonstrate that MAPLE accelerates the learning process and effectively improves humans' ability to answer queries.
翻译:大语言模型(LLMs)的出现引发了利用自然语言进行偏好学习的广泛兴趣。然而,现有方法通常面临计算负担高、依赖大量人工监督以及可解释性不足等问题。为解决这些问题,我们提出了MAPLE,一个基于大语言模型引导的贝叶斯主动偏好学习框架。MAPLE利用LLMs对偏好函数分布进行建模,该模型同时以自然语言反馈和传统偏好学习反馈(如轨迹对排序)为条件。MAPLE还采用主动学习来系统性地降低该分布的不确定性,并引入一种语言条件化的主动查询选择机制,以识别信息丰富且易于回答的查询,从而减轻人工负担。我们在两个基准测试中评估了MAPLE的样本效率与偏好推断质量,其中包括一个使用OpenStreetMap数据的真实世界车辆路径规划基准。实验结果表明,MAPLE能够加速学习过程,并有效提升人类回答查询的能力。