Optimal multi-environment causal regularization

In this manuscript we derive the optimal out-of-sample causal predictor for a linear system that has been observed in $k+1$ within-sample environments. In this model we consider $k$ shifted environments and one observational environment. Each environment corresponds to a linear structural equation model (SEM) with its own shift and noise vector, both in $L^2$. The strength of the shifts can be put in a certain order, and we may therefore speak of all shifts that are less or equally strong than a given shift. We consider the space of all shifts are $\gamma$ times less or equally strong than any weighted average of the observed shift vectors with weights on the unit sphere. For each $\beta\in\mathbb{R}^p$ we show that the supremum of the risk functions $R_{\tilde{A}}(\beta)$ over $\tilde{A}\in C^\gamma$ has a worst-risk decomposition into a (positive) linear combination of risk functions, depending on $\gamma$. We then define the causal regularizer, $\beta_\gamma$, as the argument $\beta$ that minimizes this risk. The main result of the paper is that this regularizer can be consistently estimated with a plug-in estimator outside a set of zero Lebesgue measure in the parameter space. A practical obstacle for such estimation is that it involves the solution of a general degree polynomial which cannot be done explicitly. Therefore we also prove that an approximate plug-in estimator using the bisection method is also consistent. An interesting by-product of the proof of the main result is that the plug-in estimation of the argmin of the maxima of a finite set of quadratic risk functions is consistent outside a set of zero Lebesgue measure in the parameter space.

翻译：本文推导了在$k+1$个样本内环境中观测到的线性系统的最优样本外因果预测器。在该模型中，我们考虑$k$个偏移环境和一个观测环境。每个环境对应一个线性结构方程模型(SEM)，具有各自的偏移向量和噪声向量，两者均属于$L^2$空间。偏移强度可按特定顺序排列，因此我们可以讨论所有小于或等于给定偏移强度的偏移。我们考虑所有偏移强度为$\gamma$倍小于或等于单位球面上权重加权的观测偏移向量加权平均的偏移空间。对于每个$\beta\in\mathbb{R}^p$，我们证明风险函数$R_{\tilde{A}}(\beta)$在$\tilde{A}\in C^\gamma$上的上确界具有一个最坏风险分解，该分解由（正）线性组合的风险函数构成，且依赖于$\gamma$。随后，我们将因果正则化器$\beta_\gamma$定义为最小化该风险的参数$\beta$。本文的主要结果是，在参数空间中零勒贝格测度集之外，该正则化器可通过插件估计量一致估计。此类估计的实际障碍在于其涉及一般多项式求解，无法显式完成。因此，我们进一步证明采用二分法的近似插件估计量也是一致的。主要结果证明的一个有趣副产品是，在参数空间中零勒贝格测度集之外，有限个二次风险函数最大值的参数最小值的插件估计也是一致的。