We explore the viability of Large Language Models (LLMs), specifically OpenAI's GPT-3.5 and GPT-4, in emulating human survey respondents and eliciting preferences, with a focus on intertemporal choices. Leveraging the extensive literature on intertemporal discounting for benchmarking, we examine responses from LLMs across various languages and compare them to human responses, exploring preferences between smaller, sooner, and larger, later rewards. Our findings reveal that both GPT models demonstrate less patience than humans, with GPT-3.5 exhibiting a lexicographic preference for earlier rewards, unlike human decision-makers. Though GPT-4 does not display lexicographic preferences, its measured discount rates are still considerably larger than those found in humans. Interestingly, GPT models show greater patience in languages with weak future tense references, such as German and Mandarin, aligning with existing literature that suggests a correlation between language structure and intertemporal preferences. We demonstrate how prompting GPT to explain its decisions, a procedure we term "chain-of-thought conjoint," can mitigate, but does not eliminate, discrepancies between LLM and human responses. While directly eliciting preferences using LLMs may yield misleading results, combining chain-of-thought conjoint with topic modeling aids in hypothesis generation, enabling researchers to explore the underpinnings of preferences. Chain-of-thought conjoint provides a structured framework for marketers to use LLMs to identify potential attributes or factors that can explain preference heterogeneity across different customers and contexts.
翻译:我们探究了大型语言模型(LLMs),特别是OpenAI的GPT-3.5和GPT-4,在模拟人类调查对象并诱发偏好方面的可行性,重点关注跨期选择。借助跨期折现领域丰富的基准文献,我们考察了不同语言下LLMs的响应,并将其与人类响应进行比较,探索了“小而立即”与“大而延迟”奖励之间的偏好。我们的发现表明,两个GPT模型均表现出比人类更低的耐心,其中GPT-3.5显示出对较早奖励的字典序偏好,这与人类决策者不同。尽管GPT-4未表现出字典序偏好,但其测得的折现率仍显著高于人类。有趣的是,GPT模型在弱未来时态指涉的语言(如德语和中文)中展现出更高的耐心,这与现有文献中语言结构与跨期偏好相关性的结论一致。我们展示了如何通过引导GPT解释其决策(我们称之为“链式思维联合分析”的过程)来减轻,但无法消除LLM与人类响应之间的差异。虽然直接利用LLMs诱发偏好可能得出误导性结果,但将链式思维联合分析与主题建模相结合有助于生成假设,使研究者能够探索偏好的基础。链式思维联合分析为营销人员提供了一个结构化框架,使其能够使用LLMs识别潜在属性或因素,从而解释不同客户和情境下的偏好异质性。