The instrumental variables (IVs) method is a leading empirical strategy for causal inference. Finding IVs is a heuristic and creative process, and justifying its validity--especially exclusion restrictions--is largely rhetorical. We propose using large language models (LLMs) to search for new IVs through narratives and counterfactual reasoning, similar to how a human researcher would. The stark difference, however, is that LLMs can dramatically accelerate this process and explore an extremely large search space. We demonstrate how to construct prompts to search for potentially valid IVs. We contend that multi-step and role-playing prompting strategies are effective for simulating the endogenous decision-making processes of economic agents and for navigating language models through the realm of real-world scenarios. We apply our method to three well-known examples in economics: returns to schooling, supply and demand, and peer effects. We then extend our strategy to finding (i) control variables in regression and difference-in-differences and (ii) running variables in regression discontinuity designs.
翻译:工具变量(IVs)方法是因果推断领域的主流实证策略。寻找工具变量是一个启发式且富有创造性的过程,而其有效性(尤其是排他性约束)的论证在很大程度上依赖于论述性推理。我们提出利用大语言模型(LLMs)通过叙事与反事实推理来搜索新的工具变量,其方式类似于人类研究者的思考过程。然而,关键区别在于,大语言模型能够显著加速这一过程,并探索极其庞大的搜索空间。我们展示了如何构建提示词以搜索潜在有效的工具变量。我们认为,多步骤与角色扮演提示策略能有效模拟经济主体的内生决策过程,并引导语言模型在现实场景中进行探索。我们将该方法应用于经济学中三个经典案例:教育回报率、供给与需求以及同伴效应。随后,我们将此策略拓展至寻找(i)回归与双重差分法中的控制变量,以及(ii)断点回归设计中的驱动变量。