Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation

Designing preference elicitation (PE) methodologies that can quickly ascertain a user's top item preferences in a cold-start setting is a key challenge for building effective and personalized conversational recommendation (ConvRec) systems. While large language models (LLMs) constitute a novel technology that enables fully natural language (NL) PE dialogues, we hypothesize that monolithic LLM NL-PE approaches lack the multi-turn, decision-theoretic reasoning required to effectively balance the NL exploration and exploitation of user preferences towards an arbitrary item set. In contrast, traditional Bayesian optimization PE methods define theoretically optimal PE strategies, but fail to use NL item descriptions or generate NL queries, unrealistically assuming users can express preferences with direct item ratings and comparisons. To overcome the limitations of both approaches, we formulate NL-PE in a Bayesian Optimization (BO) framework that seeks to generate NL queries which actively elicit natural language feedback to reduce uncertainty over item utilities to identify the best recommendation. We demonstrate our framework in a novel NL-PE algorithm, PEBOL, which uses Natural Language Inference (NLI) between user preference utterances and NL item descriptions to maintain preference beliefs and BO strategies such as Thompson Sampling (TS) and Upper Confidence Bound (UCB) to guide LLM query generation. We numerically evaluate our methods in controlled experiments, finding that PEBOL achieves up to 131% improvement in MAP@10 after 10 turns of cold start NL-PE dialogue compared to monolithic GPT-3.5, despite relying on a much smaller 400M parameter NLI model for preference inference.

翻译：在冷启动场景下快速确定用户顶级偏好的偏好引导（PE）方法设计，是构建高效个性化对话推荐（ConvRec）系统的关键挑战。虽然大语言模型（LLM）作为新兴技术实现了完全自然语言（NL）的PE对话，但我们假设单一的LLM NL-PE方法缺乏有效平衡自然语言探索与用户偏好利用的多轮决策理论推理能力。相比之下，传统贝叶斯优化PE方法虽定义了理论最优的PE策略，却无法利用自然语言项目描述或生成自然语言查询，不切实际地假设用户能通过直接评分和比较来表达偏好。为克服两种方法的局限性，我们将NL-PE纳入贝叶斯优化（BO）框架，通过生成主动引导自然语言反馈的自然语言查询，降低项目效用的不确定性以确定最佳推荐。我们在新型NL-PE算法PEBOL中展示了该框架，该算法利用用户偏好表述与项目自然语言描述之间的自然语言推理（NLI）维护偏好信念，并采用汤普森采样（TS）和上置信界（UCB）等BO策略指导LLM查询生成。通过受控实验数值评估，我们发现尽管PEBOL仅依赖参数量小得多的400M NLI模型进行偏好推理，但在10轮冷启动NL-PE对话后，其MAP@10指标相比单一GPT-3.5模型实现了高达131%的提升。