As Large Language Models (LLMs) are increasingly deployed in applications such as travel assistance and purchasing support, they are often required to make subjective choices on behalf of users in settings where no objectively correct answer exists. We study LLM decision-making in a travel-assistant context by presenting models with choice dilemmas and analyzing their responses using multinomial logit models to derive implied willingness to pay (WTP) estimates. These WTP values are subsequently compared to human benchmark values from the economics literature. In addition to a baseline setting, we examine how model behavior changes under more realistic conditions, including the provision of information about users' past choices and persona-based prompting. Our results show that while meaningful WTP values can be derived for larger LLMs, they also display systematic deviations at the attribute level. Additionally, they tend to overestimate human WTP overall, particularly when expensive options or business-oriented personas are introduced. Conditioning models on prior preferences for cheaper options yields valuations that are closer to human benchmarks. Overall, our findings highlight both the potential and the limitations of using LLMs for subjective decision support and underscore the importance of careful model selection, prompt design, and user representation when deploying such systems in practice.
翻译:随着大型语言模型(LLMs)日益部署于旅行辅助和购物支持等应用场景,它们常需在无客观正确答案的环境下为用户做出主观选择。本研究在旅行助手情境下探究LLM的决策行为,通过向模型呈现选择困境,并运用多项逻辑模型分析其响应,从而推导隐含的支付意愿(WTP)估计值。随后将这些WTP值与经济学文献中的人类基准值进行比较。除基准设定外,我们还考察了模型行为在更现实条件下的变化,包括提供用户历史选择信息和基于人物角色提示。结果表明:虽然较大规模的LLM可推导出有意义的WTP值,但其在属性层面仍呈现系统性偏差。此外,模型整体上倾向于高估人类WTP,尤其在引入昂贵选项或商务型人物角色时更为显著。将模型条件限制于偏好廉价选项的先验选择后,其估值更接近人类基准。总体而言,我们的发现既揭示了LLM在主观决策支持中的潜力,也指出了其局限性,并强调在实际部署此类系统时需审慎选择模型、设计提示词及构建用户表征。