In causal inference about two treatments, Conditional Average Treatment Effects (CATEs) play an important role as a quantity representing an individualized causal effect, defined as a difference between the expected outcomes of the two treatments conditioned on covariates. This study assumes two linear regression models between a potential outcome and covariates of the two treatments and defines CATEs as a difference between the linear regression models. Then, we propose a method for consistently estimating CATEs even under high-dimensional and non-sparse parameters. In our study, we demonstrate that desirable theoretical properties, such as consistency, remain attainable even without assuming sparsity explicitly if we assume a weaker assumption called implicit sparsity originating from the definition of CATEs. In this assumption, we suppose that parameters of linear models in potential outcomes can be divided into treatment-specific and common parameters, where the treatment-specific parameters take difference values between each linear regression model, while the common parameters remain identical. Thus, in a difference between two linear regression models, the common parameters disappear, leaving only differences in the treatment-specific parameters. Consequently, the non-zero parameters in CATEs correspond to the differences in the treatment-specific parameters. Leveraging this assumption, we develop a Lasso regression method specialized for CATE estimation and present that the estimator is consistent. Finally, we confirm the soundness of the proposed method by simulation studies.
翻译:在关于两种处理方式的因果推断中,条件平均处理效应(CATE)作为表征个体化因果效应的关键量,定义为两种处理方式在协变量条件下的期望结果之差。本研究假设两种处理方式的潜在结果与协变量之间存在两个线性回归模型,并将CATE定义为这两个线性回归模型之差。在此基础上,我们提出了一种即使在参数高维且非稀疏的情况下仍能一致估计CATE的方法。本研究表明,若不显式假设稀疏性,而采用源自CATE定义的称为隐式稀疏性的较弱假设,则一致性等理想理论性质仍然可被获得。在该假设下,我们假定潜在结果中线性模型的参数可划分为处理特异性参数和公共参数,其中处理特异性参数在两个线性回归模型间取值不同,而公共参数保持不变。因此,在两个线性回归模型之差中,公共参数相互抵消,仅保留处理特异性参数的差值。由此,CATE中的非零参数对应于处理特异性参数的差异。利用这一假设,我们开发了一种专用于CATE估计的Lasso回归方法,并证明该估计量具有一致性。最后,通过模拟研究验证了所提方法的合理性。