This study investigates the estimation and the statistical inference about Conditional Average Treatment Effects (CATEs), which have garnered attention as a metric representing individualized causal effects. In our data-generating process, we assume linear models for the outcomes associated with binary treatments and define the CATE as a difference between the expected outcomes of these linear models. This study allows the linear models to be high-dimensional, and our interest lies in consistent estimation and statistical inference for the CATE. In high-dimensional linear regression, one typical approach is to assume sparsity. However, in our study, we do not assume sparsity directly. Instead, we consider sparsity only in the difference of the linear models. We first use a doubly robust estimator to approximate this difference and then regress the difference on covariates with Lasso regularization. Although this regression estimator is consistent for the CATE, we further reduce the bias using the techniques in double/debiased machine learning (DML) and debiased Lasso, leading to $\sqrt{n}$-consistency and confidence intervals. We refer to the debiased estimator as the triple/debiased Lasso (TDL), applying both DML and debiased Lasso techniques. We confirm the soundness of our proposed method through simulation studies.
翻译:本研究探讨条件平均处理效应(CATE)的估计与统计推断问题,该指标作为表征个体化因果效应的度量标准已受到广泛关注。在数据生成过程中,我们假设与二元处理相关联的结果变量服从线性模型,并将CATE定义为这些线性模型期望结果的差值。本研究允许线性模型具有高维特性,研究重点在于CATE的一致估计与统计推断。在高维线性回归中,稀疏性假设是典型处理方式,但本研究并未直接采用该假设,而是仅在线性模型的差异项中考虑稀疏性。我们首先利用双稳健估计量近似该差异,随后通过Lasso正则化方法将该差异对协变量进行回归。虽然该回归估计量对CATE具有一致性,但为了进一步降低偏差,我们采用双重/去偏机器学习(DML)与去偏Lasso技术,最终实现$\sqrt{n}$一致性和置信区间构建。我们将这种同时应用DML与去偏Lasso技术的去偏估计量称为三重/去偏Lasso(TDL)。通过仿真研究验证了所提方法的有效性。