Actively inferring user preferences, for example by asking good questions, is important for any human-facing decision-making system. Active inference allows such systems to adapt and personalize themselves to nuanced individual preferences. To enable this ability for instruction-tuned large language models (LLMs), one may prompt them to ask users questions to infer their preferences, transforming the language models into more robust, interactive systems. However, out of the box, these models are not efficient at extracting preferences: the questions they generate are not informative, requiring a high number of user interactions and impeding the usability of the downstream system. In this work, we introduce an inference-time algorithm that helps LLMs quickly infer preferences by using more informative questions. Our algorithm uses a probabilistic model whose conditional distributions are defined by prompting an LLM, and returns questions that optimize expected entropy and expected model change. Results in a simplified interactive web shopping setting with real product items show that an LLM equipped with our entropy reduction algorithm outperforms baselines with the same underlying LLM on task performance while using fewer user interactions.
翻译:主动推断用户偏好(例如通过提出优质问题)对于任何面向人类的决策系统都至关重要。主动推断使此类系统能够适应并个性化地满足个体细微的偏好。为使指令微调的大语言模型(LLMs)具备此能力,可引导其通过向用户提问来推断偏好,从而将语言模型转化为更鲁棒的交互式系统。然而,未经专门设计时,这些模型在提取偏好方面效率不足:其生成的问题信息量低,需要大量用户交互,从而影响下游系统的可用性。本研究提出一种推理时算法,通过生成信息量更高的问题帮助LLMs快速推断偏好。该算法采用概率模型,其条件分布通过提示LLM定义,并返回优化期望熵与期望模型变化的问题。在包含真实商品项目的简化交互式网络购物场景中的实验结果表明:搭载我们熵减算法的LLM在任务性能上优于使用相同底层LLM的基线方法,且所需用户交互次数更少。