Can LLMs Capture Human Preferences?

We explore the viability of Large Language Models (LLMs), specifically OpenAI's GPT-3.5 and GPT-4, in emulating human survey respondents and eliciting preferences, with a focus on intertemporal choices. Leveraging the extensive literature on intertemporal discounting for benchmarking, we examine responses from LLMs across various languages and compare them to human responses, exploring preferences between smaller, sooner, and larger, later rewards. Our findings reveal that both GPT models demonstrate less patience than humans, with GPT-3.5 exhibiting a lexicographic preference for earlier rewards, unlike human decision-makers. Though GPT-4 does not display lexicographic preferences, its measured discount rates are still considerably larger than those found in humans. Interestingly, GPT models show greater patience in languages with weak future tense references, such as German and Mandarin, aligning with existing literature that suggests a correlation between language structure and intertemporal preferences. We demonstrate how prompting GPT to explain its decisions, a procedure we term "chain-of-thought conjoint," can mitigate, but does not eliminate, discrepancies between LLM and human responses. While directly eliciting preferences using LLMs may yield misleading results, combining chain-of-thought conjoint with topic modeling aids in hypothesis generation, enabling researchers to explore the underpinnings of preferences. Chain-of-thought conjoint provides a structured framework for marketers to use LLMs to identify potential attributes or factors that can explain preference heterogeneity across different customers and contexts.

翻译：我们探究了大型语言模型（LLMs），特别是OpenAI的GPT-3.5和GPT-4，在模拟人类调查对象并诱发偏好方面的可行性，重点关注跨期选择。借助跨期折现领域丰富的基准文献，我们考察了不同语言下LLMs的响应，并将其与人类响应进行比较，探索了“小而立即”与“大而延迟”奖励之间的偏好。我们的发现表明，两个GPT模型均表现出比人类更低的耐心，其中GPT-3.5显示出对较早奖励的字典序偏好，这与人类决策者不同。尽管GPT-4未表现出字典序偏好，但其测得的折现率仍显著高于人类。有趣的是，GPT模型在弱未来时态指涉的语言（如德语和中文）中展现出更高的耐心，这与现有文献中语言结构与跨期偏好相关性的结论一致。我们展示了如何通过引导GPT解释其决策（我们称之为“链式思维联合分析”的过程）来减轻，但无法消除LLM与人类响应之间的差异。虽然直接利用LLMs诱发偏好可能得出误导性结果，但将链式思维联合分析与主题建模相结合有助于生成假设，使研究者能够探索偏好的基础。链式思维联合分析为营销人员提供了一个结构化框架，使其能够使用LLMs识别潜在属性或因素，从而解释不同客户和情境下的偏好异质性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日