This paper studies the prediction of a target $\mathbf{z}$ from a pair of random variables $(\mathbf{x},\mathbf{y})$, where the ground-truth predictor is additive $\mathbb{E}[\mathbf{z} \mid \mathbf{x},\mathbf{y}] = f_\star(\mathbf{x}) +g_{\star}(\mathbf{y})$. We study the performance of empirical risk minimization (ERM) over functions $f+g$, $f \in F$ and $g \in G$, fit on a given training distribution, but evaluated on a test distribution which exhibits covariate shift. We show that, when the class $F$ is "simpler" than $G$ (measured, e.g., in terms of its metric entropy), our predictor is more resilient to $\textbf{heterogenous covariate shifts}$ in which the shift in $\mathbf{x}$ is much greater than that in $\mathbf{y}$. Our analysis proceeds by demonstrating that ERM behaves $\textbf{qualitatively similarly to orthogonal machine learning}$: the rate at which ERM recovers the $f$-component of the predictor has only a lower-order dependence on the complexity of the class $G$, adjusted for partial non-indentifiability introduced by the additive structure. These results rely on a novel H\"older style inequality for the Dudley integral which may be of independent interest. Moreover, we corroborate our theoretical findings with experiments demonstrating improved resilience to shifts in "simpler" features across numerous domains.
翻译:本文研究由随机变量对$(\mathbf{x},\mathbf{y})$预测目标$\mathbf{z}$的问题,其中真实预测器具有可加性$\mathbb{E}[\mathbf{z} \mid \mathbf{x},\mathbf{y}] = f_\star(\mathbf{x}) +g_{\star}(\mathbf{y})$。我们研究经验风险最小化(ERM)在函数类$f+g$($f \in F$,$g \in G$)上的性能,该方法在给定训练分布上拟合,但在呈现协变量偏移的测试分布上评估。我们证明,当类别$F$比$G$“更简单”(例如以度量熵衡量)时,我们的预测器对$\textbf{异质协变量偏移}$(其中$\mathbf{x}$的偏移远大于$\mathbf{y}$)具有更强的鲁棒性。分析表明,ERM的行为$\textbf{在性质上与正交机器学习相似}$:ERM恢复预测器中$f$分量的速率仅对类别$G$的复杂度具有低阶依赖性,并根据加性结构引入的部分非可识别性进行调整。这些结果依赖于一个新颖的Dudley积分Hölder型不等式,该不等式可能具有独立研究价值。此外,我们通过实验验证了理论发现,展示了在多个领域中“更简单”特征对偏移的鲁棒性改善。