This paper studies the prediction of a target $\mathbf{z}$ from a pair of random variables $(\mathbf{x},\mathbf{y})$, where the ground-truth predictor is additive $\mathbb{E}[\mathbf{z} \mid \mathbf{x},\mathbf{y}] = f_\star(\mathbf{x}) +g_{\star}(\mathbf{y})$. We study the performance of empirical risk minimization (ERM) over functions $f+g$, $f \in F$ and $g \in G$, fit on a given training distribution, but evaluated on a test distribution which exhibits covariate shift. We show that, when the class $F$ is "simpler" than $G$ (measured, e.g., in terms of its metric entropy), our predictor is more resilient to heterogeneous covariate shifts} in which the shift in $\mathbf{x}$ is much greater than that in $\mathbf{y}$. Our analysis proceeds by demonstrating that ERM behaves qualitatively similarly to orthogonal machine learning: the rate at which ERM recovers the $f$-component of the predictor has only a lower-order dependence on the complexity of the class $G$, adjusted for partial non-indentifiability introduced by the additive structure. These results rely on a novel H\"older style inequality for the Dudley integral which may be of independent interest. Moreover, we corroborate our theoretical findings with experiments demonstrating improved resilience to shifts in "simpler" features across numerous domains.
翻译:本文研究从随机变量对$(\mathbf{x},\mathbf{y})$预测目标$\mathbf{z}$的问题,其中真实预测器具有可加性$\mathbb{E}[\mathbf{z} \mid \mathbf{x},\mathbf{y}] = f_\star(\mathbf{x}) +g_{\star}(\mathbf{y})$。我们分析在给定训练分布上拟合形如$f+g$($f \in F$,$g \in G$)函数的经验风险最小化(ERM)方法,并在存在协变量偏移的测试分布上评估其性能。当函数类$F$比$G$“更简单”(例如以度量熵衡量)时,我们的预测器对$\mathbf{x}$偏移远大于$\mathbf{y}$偏移的异质协变量偏移表现出更强的鲁棒性。分析表明,ERM的行为与正交机器学习定性相似:ERM恢复预测器$f$分量的速率对函数类$G$复杂度的依赖仅为低阶项,且已考虑可加结构引入的部分不可识别性调整。这些结果基于Dudley积分的新型Hölder型不等式,该不等式可能具有独立学术价值。此外,我们通过跨多个领域的实验验证了理论发现,证实“更简单”特征对偏移具有更强的鲁棒性。