To estimate the causal effect of an endogenous treatment using clustered data, the canonical two-stage least squares (2sls) estimates a linear regression of the outcome on treatment status using an instrumental variable (IV) and conducts inference with cluster-robust standard errors. When both the treatment and the IV vary within clusters, an alternative two-stage least squares with fixed effects (2sfe) additionally includes cluster indicators in the regression, thereby incorporating cluster information into point estimation as well. This paper studies the trade-off between these approaches within the local average treatment effect (LATE) framework. When clusters are homogeneous, we show that both approaches yield valid large-sample inference for the LATE, and that 2sfe is more efficient than canonical 2sls only when the variation in cluster-specific effects dominates idiosyncratic variation and the IV has sufficient within-cluster variation. When clusters are heterogeneous, we show that 2sfe identifies a weighted average of cluster-specific LATEs, whereas the canonical 2sls generally does not. We further propose a test for detecting cluster heterogeneity.
翻译:为使用聚类数据估计内生处理的因果效应,经典两阶段最小二乘法(2SLS)通过工具变量(IV)对结果变量关于处理状态进行线性回归,并采用聚类稳健标准误进行推断。当处理变量与工具变量均在聚类内部存在变异时,另一种含固定效应的两阶段最小二乘法(2SFE)在回归中额外纳入聚类指示变量,从而将聚类信息同时引入点估计过程。本文在局部平均处理效应(LATE)框架下研究这两种方法间的权衡。当聚类同质时,我们证明两种方法均能对LATE产生有效的大样本推断,且仅当聚类特定效应的变异主导个体特定变异且工具变量具有充分聚类内变异时,2SFE才比经典2SLS更有效率。当聚类异质时,我们证明2SFE识别的是聚类特定LATE的加权平均值,而经典2SLS通常无法实现。我们进一步提出检测聚类异质性的检验方法。