Optimal worst-risk minimization in structural equation models with random coefficients

The insight that causal parameters are particularly suitable for out-of-sample prediction has sparked a lot development of causal-like predictors. However, the connection with strict causal targets, has limited the development with good risk minimization properties, but without a direct causal interpretation. In this manuscript we derive the optimal out-of-sample risk minimizing predictor of a certain target $Y$ in a non-linear system $(X,Y)$ that has been trained in several within-sample environments. We consider data from an observation environment, and several shifted environments. Each environment corresponds to a structural equation model (SEM), with random coefficients and with its own shift and noise vector, both in $L^2$. Unlike previous approaches, we also allow shifts in the target value. We define a sieve of out-of-sample environments, consisting of all shifts $\tilde{A}$ that are at most $\gamma$ times as strong as any weighted average of the observed shift vectors. For each $\beta\in\mathbb{R}^p$ we show that the supremum of the risk functions $R_{\tilde{A}}(\beta)$ has a worst-risk decomposition into a (positive) non-linear combination of risk functions, depending on $\gamma$. We then define the set $\mathcal{B}_\gamma$, as minimizers of this risk. The main result of the paper is that there is a unique minimizer ($|\mathcal{B}_\gamma|=1$) that can be consistently estimated by an explicit estimator, outside a set of zero Lebesgue measure in the parameter space. A practical obstacle for the initial method of estimation is that it involves the solution of a general degree polynomials. Therefore, we prove that an approximate estimator using the bisection method is also consistent.

翻译：因果参数特别适用于样本外预测这一洞见，已推动了许多类因果预测器的发展。然而，与严格因果目标的关联，限制了那些具有良好风险最小化特性但缺乏直接因果解释的方法的发展。在本手稿中，我们推导了在多个样本内环境中训练的非线性系统 $(X,Y)$ 中，特定目标 $Y$ 的最优样本外风险最小化预测器。我们考虑来自一个观测环境和多个偏移环境的数据。每个环境对应一个具有随机系数的结构方程模型（SEM），并拥有其自身的偏移向量和噪声向量，两者均属于 $L^2$ 空间。与先前方法不同，我们还允许目标值发生偏移。我们定义了一个样本外环境的筛，它由所有偏移 $\tilde{A}$ 组成，这些偏移的强度至多是任何观测偏移向量加权平均的 $\gamma$ 倍。对于每个 $\beta\in\mathbb{R}^p$，我们证明了风险函数 $R_{\tilde{A}}(\beta)$ 的上确界具有一个最坏风险分解，该分解为一个依赖于 $\gamma$ 的（正）非线性风险函数组合。随后，我们将集合 $\mathcal{B}_\gamma$ 定义为该风险的最小化解集。本文的主要结果是，存在一个唯一的最小化解（$|\mathcal{B}_\gamma|=1$），该解可以在参数空间中一个零勒贝格测度集之外，通过一个显式估计量被一致地估计。初始估计方法的一个实际障碍在于它涉及求解一般次数的多项式。因此，我们证明了使用二分法的近似估计量也是一致的。