Flexible machine learning tools are increasingly used to estimate heterogeneous treatment effects. This paper gives an accessible tutorial demonstrating the use of the causal forest algorithm, available in the R package grf. We start with a brief non-technical overview of treatment effect estimation methods, focusing on estimation in observational studies; the same techniques can also be applied in experimental studies. We then discuss the logic of estimating heterogeneous effects using the extension of the random forest algorithm implemented in grf. Finally, we illustrate causal forest by conducting a secondary analysis on the extent to which individual differences in resilience to high combat stress can be measured among US Army soldiers deploying to Afghanistan based on information about these soldiers available prior to deployment. We illustrate simple and interpretable exercises for model selection and evaluation, including targeting operator characteristics curves, Qini curves, area-under-the-curve summaries, and best linear projections. A replication script with simulated data is available at https://github.com/grf-labs/grf/tree/master/experiments/ijmpr
翻译:灵活的机器学习工具正日益广泛地应用于估计异质性治疗效果。本文提供了一篇易于理解的教程,演示了R软件包grf中可用的因果森林算法的应用。我们首先对治疗效果估计方法进行简要的非技术性概述,重点关注观察性研究中的估计;相同技术也可应用于实验研究。随后,我们讨论了利用grf中实现的随机森林算法扩展来估计异质性效应的逻辑。最后,我们通过对美国陆军士兵部署至阿富汗前可获得的信息,分析个体对高强度战斗压力适应能力的差异可测量程度,以此进行二次分析来展示因果森林的应用。我们演示了模型选择与评估中简单且可解释的实践方法,包括目标操作特征曲线、Qini曲线、曲线下面积汇总以及最佳线性投影。包含模拟数据的复现脚本可在https://github.com/grf-labs/grf/tree/master/experiments/ijmpr获取。