This paper introduces aggregate Bayesian Causal Forests (aBCF), a new Bayesian model for causal inference using aggregated data. Aggregated data are common in policy evaluations where we observe individuals such as students, but participation in an intervention is determined at a higher level of aggregation, such as schools implementing a curriculum. Interventions often have millions of individuals but far fewer higher-level units, making aggregation computationally attractive. To analyze aggregated data, a model must account for heteroskedasticity and intraclass correlation (ICC). Like Bayesian Causal Forests (BCF), aBCF estimates heterogeneous treatment effects with minimal parametric assumptions, but accounts for these aggregated data features, improving estimation of average and aggregate unit-specific effects. After introducing the aBCF model, we demonstrate via simulation that aBCF improves performance for aggregated data over BCF. We anchor our simulation on an evaluation of a large-scale Medicare primary care model. We demonstrate that aBCF produces treatment effect estimates with a lower root mean squared error and narrower uncertainty intervals while achieving the same level of coverage. We show that aBCF is not sensitive to the prior distribution used and that estimation improvements relative to BCF decline as the ICC approaches one. Code is available at https://github.com/mathematica-mpr/bcf-1.
翻译:本文提出了聚合贝叶斯因果森林(aBCF),这是一种利用聚合数据进行因果推断的新型贝叶斯模型。聚合数据在政策评估中十分常见,例如我们观测学生个体层面的数据,但干预参与决策发生在更高聚合层级(如实施课程的学校)。这类干预通常涉及数百万个体,而高层级单元数量却少得多,这使得聚合处理在计算上更具优势。为分析聚合数据,模型必须考虑异方差性和组内相关性(ICC)。与贝叶斯因果森林(BCF)类似,aBCF能以最小参数化假设估计异质性处理效应,同时兼顾聚合数据的特征,从而提升平均效应和特定聚合单元效应的估计精度。在介绍aBCF模型后,我们通过仿真实验证明,在处理聚合数据时aBCF相比BCF具有更优性能。我们的仿真基于一项大规模医疗保险初级护理模型的评估研究展开。实验表明,aBCF能够以更低的均方根误差和更窄的不确定性区间生成处理效应估计,同时保持相同的覆盖水平。我们证明aBCF对先验分布的选择不敏感,且当ICC趋近于1时,其相对于BCF的估计改进会逐渐减弱。代码发布于 https://github.com/mathematica-mpr/bcf-1。