The Challenge of Using LLMs to Simulate Human Behavior: A Causal Inference Perspective

Large Language Models (LLMs) have demonstrated impressive potential to simulate human behavior. Using a causal inference framework, we empirically and theoretically analyze the challenges of conducting LLM-simulated experiments, and explore potential solutions. In the context of demand estimation, we show that variations in the treatment included in the prompt (e.g., price of focal product) can cause variations in unspecified confounding factors (e.g., price of competitors, historical prices, outside temperature), introducing endogeneity and yielding implausibly flat demand curves. We propose a theoretical framework suggesting this endogeneity issue generalizes to other contexts and won't be fully resolved by merely improving the training data. Unlike real experiments where researchers assign pre-existing units across conditions, LLMs simulate units based on the entire prompt, which includes the description of the treatment. Therefore, due to associations in the training data, the characteristics of individuals and environments simulated by the LLM can be affected by the treatment assignment. We explore two potential solutions. The first specifies all contextual variables that affect both treatment and outcome, which we demonstrate to be challenging for a general-purpose LLM. The second explicitly specifies the source of treatment variation in the prompt given to the LLM (e.g., by informing the LLM that the store is running an experiment). While this approach only allows the estimation of a conditional average treatment effect that depends on the specific experimental design, it provides valuable directional results for exploratory analysis.

翻译：大语言模型（LLM）在模拟人类行为方面展现出令人瞩目的潜力。我们采用因果推理框架，从实证与理论两个层面分析开展LLM模拟实验所面临的挑战，并探索潜在解决方案。在需求估计场景中，我们证明提示中纳入的处理变量（如核心产品价格）的变动，会引发未明确指定的混杂因素（如竞品价格、历史价格、外部温度）的连带波动，由此产生内生性问题，最终导致需求曲线呈现不可靠的扁平形态。我们提出的理论框架表明，这种内生性问题具有跨场景普适性，且无法仅通过改进训练数据得到彻底解决。与传统实验中研究者将既有实验单元随机分配至不同条件不同，LLM基于包含处理条件描述的完整提示模拟实验单元。因此，受训练数据中关联关系的影响，LLM模拟的个体特征与情境特征会随处理条件的改变而改变。我们探索两种潜在解决方案：第一种方案需指定所有同时影响处理变量与结果变量的情境变量，但在通用型LLM中实现存在困难；第二种方案要求在提示中向LLM明确说明处理变量的变异来源（例如告知LLM该商店正在进行实验）。尽管该方案仅能估算依赖于特定实验设计的条件平均处理效应，但其为探索性分析提供了具有方向性参考价值的结论。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日