Flexible machine learning tools are being used increasingly to estimate heterogeneous treatment effects. This paper gives an accessible tutorial demonstrating the use of the causal forest algorithm, available in the R package grf. We start with a brief non-technical overview of treatment effect estimation methods with a focus on estimation in observational studies, although similar methods can be used in experimental studies. We then discuss the logic of estimating heterogeneous effects using the extension of the random forest algorithm implemented in grf. Finally, we illustrate causal forest by conducting a secondary analysis on the extent to which individual differences in resilience to high combat stress can be measured among US Army soldiers deploying to Afghanistan based on information about these soldiers available prior to deployment. Throughout we illustrate simple and interpretable exercises for both model selection and evaluation, including targeting operator characteristics curves, Qini curves, area-under-the-curve summaries, and best linear projections. A replication script with simulated data is available at github.com/grf-labs/grf/tree/master/experiments/ijmpr
翻译:灵活的机器学习工具正被越来越多地用于估计异质性治疗效应。本文提供了一篇易于理解的教程,演示了R软件包grf中可用的因果森林算法。我们首先简要概述了治疗效应估计方法的非技术性介绍,重点聚焦于观察性研究中的估计,尽管类似方法也可用于实验研究。接着,我们讨论了利用grf中实现的随机森林算法扩展来估计异质性效应的逻辑。最后,我们通过对美国陆军士兵部署至阿富汗前可获取的信息,进行二次分析来阐释因果森林的应用,以评估个体对高强度战斗压力适应能力的差异程度。全文贯穿了模型选择与评估的简单可解释性练习,包括目标操作特征曲线、Qini曲线、曲线下面积汇总以及最佳线性投影。基于模拟数据的复现脚本可在github.com/grf-labs/grf/tree/master/experiments/ijmpr获取。