This paper considers a multiple environments linear regression model in which data from multiple experimental settings are collected. The joint distribution of the response variable and covariate may vary across different environments, yet the conditional expectation of $y$ given the unknown set of important variables are invariant across environments. Such a statistical model is related to the problem of endogeneity, causal inference, and transfer learning. The motivation behind it is illustrated by how the goals of prediction and attribution are inherent in estimating the true parameter and the important variable set. We construct a novel {\it environment invariant linear least squares (EILLS)} objective function, a multiple-environment version of linear least squares that leverages the above conditional expectation invariance structure and heterogeneity among different environments to determine the true parameter. Our proposed method is applicable without any additional structural knowledge and can identify the true parameter under a near-minimal identification condition. We establish non-asymptotic $\ell_2$ error bounds on the estimation error for the EILLS estimator in the presence of spurious variables. Moreover, we further show that the EILLS estimator is able to eliminate all endogenous variables and the $\ell_0$ penalized EILLS estimator can achieve variable selection consistency in high-dimensional regimes. These non-asymptotic results demonstrate the sample efficiency of the EILLS estimator and its capability to circumvent the curse of endogeneity in an algorithmic manner without any prior structural knowledge.
翻译:本文考虑多环境线性回归模型,其中收集了来自多个实验设置的数据。响应变量与协变量的联合分布可能因环境而异,但给定未知重要变量集后,$y$的条件期望在不同环境下保持不变。此类统计模型与内生性问题、因果推断以及迁移学习相关。其背后的动机通过预测与归因的目标内在地体现于真实参数与重要变量集的估计中。我们构建了一种新颖的"环境不变线性最小二乘(EILLS)"目标函数,作为线性最小二乘的多环境扩展,它利用上述条件期望不变性结构及不同环境间的异质性来估计真实参数。所提方法无需任何额外结构知识即可应用,并能在近最小识别条件下识别真实参数。我们建立了存在虚假变量时EILLS估计量估计误差的非渐近$\ell_2$误差界。此外,我们进一步证明EILLS估计量能够消除所有内生变量,且$\ell_0$惩罚EILLS估计量在高维情形下可实现变量选择一致性。这些非渐近结果展示了EILLS估计量的样本效率及其无需先验结构知识便可在算法层面规避内生性诅咒的能力。