两阶段最小二乘法在聚类数据中的应用 (Two-stage least squares with clustered data)

Clustered data -- where units of observation are nested within higher-level groups, such as repeated measurements on users, or panel data of firms, industries, or geographic regions -- are ubiquitous in business research. When the objective is to estimate the causal effect of a potentially endogenous treatment, a common approach -- which we call the canonical two-stage least squares (2sls) -- is to fit a 2sls regression of the outcome on treatment status with instrumental variables (IVs) for point estimation, and apply cluster-robust standard errors to account for clustering in inference. When both the treatment and IVs vary within clusters, a natural alternative -- which we call the two-stage least squares with fixed effects (2sfe) -- is to include cluster indicators in the 2sls specification, thereby incorporating cluster information in point estimation as well. This paper clarifies the trade-off between these two approaches within the local average treatment effect (LATE) framework, and makes three contributions. First, we establish the validity of both approaches for Wald-type inference of the LATE when clusters are homogeneous, and characterize their relative efficiency. We show that, when the true outcome model includes cluster-specific effects, 2sfe is more efficient than the canonical 2sls only when the variation in cluster-specific effects dominates that in unit-level errors. Second, we show that with heterogeneous clusters, 2sfe recovers a weighted average of cluster-specific LATEs, whereas the canonical 2sls generally does not. Third, to guide empirical choice between the two procedures, we develop a joint asymptotic theory for the two estimators under homogeneous clusters, and propose a Wald-type test for detecting cluster heterogeneity.

翻译：聚类数据——即观测单位嵌套于更高层级群体中的数据，例如对用户的重复测量、或企业、行业、地理区域的面板数据——在商业研究中普遍存在。当研究目标为估计潜在内生处理的因果效应时，一种常用方法（我们称之为经典两阶段最小二乘法）是采用工具变量进行两阶段最小二乘回归以估计处理状态对结果的影响，并应用聚类稳健标准误以在推断中考虑聚类效应。当处理变量与工具变量均在聚类内部变化时，一种自然的替代方法（我们称之为带固定效应的两阶段最小二乘法）是在两阶段最小二乘设定中加入聚类指示变量，从而将聚类信息同时纳入点估计。本文在局部平均处理效应框架下阐明这两种方法的权衡关系，并作出三点贡献。首先，在聚类同质条件下，我们证实两种方法均适用于局部平均处理效应的沃尔德型推断，并刻画了它们的相对效率。研究表明，当真实结果模型包含聚类特定效应时，仅当聚类特定效应的变异主导单位级误差的变异时，带固定效应的两阶段最小二乘法才比经典两阶段最小二乘法更有效。其次，我们证明在异质聚类情境下，带固定效应的两阶段最小二乘法可恢复聚类特定局部平均处理效应的加权平均值，而经典两阶段最小二乘法则通常无法实现。第三，为指导两种方法的实证选择，我们在同质聚类假设下建立了两种估计量的联合渐近理论，并提出用于检测聚类异质性的沃尔德型检验。