This paper studies the prediction of a target $\mathbf{z}$ from a pair of random variables $(\mathbf{x},\mathbf{y})$, where the ground-truth predictor is additive $\mathbb{E}[\mathbf{z} \mid \mathbf{x},\mathbf{y}] = f_\star(\mathbf{x}) +g_{\star}(\mathbf{y})$. We study the performance of empirical risk minimization (ERM) over functions $f+g$, $f \in \mathcal{F}$ and $g \in \mathcal{G}$, fit on a given training distribution, but evaluated on a test distribution which exhibits covariate shift. We show that, when the class $\mathcal{F}$ is "simpler" than $\mathcal{G}$ (measured, e.g., in terms of its metric entropy), our predictor is more resilient to \emph{heterogenous covariate shifts} in which the shift in $\mathbf{x}$ is much greater than that in $\mathbf{y}$. These results rely on a novel H\"older style inequality for the Dudley integral which may be of independent interest. Moreover, we corroborate our theoretical findings with experiments demonstrating improved resilience to shifts in "simpler" features across numerous domains.
翻译:本文研究从随机变量对 $(\mathbf{x},\mathbf{y})$ 预测目标 $\mathbf{z}$ 的问题,其中真实预测器具有可加性 $\mathbb{E}[\mathbf{z} \mid \mathbf{x},\mathbf{y}] = f_\star(\mathbf{x}) +g_{\star}(\mathbf{y})$。我们考察经验风险最小化(ERM)在函数空间 $f+g$($f \in \mathcal{F}$,$g \in \mathcal{G}$)上的表现:模型在给定训练分布上拟合,但在存在协变量偏移的测试分布上评估。研究表明,当 $\mathcal{F}$ 类比 $\mathcal{G}$ 类“更简单”(例如以度量熵衡量)时,我们的预测器对$\mathbf{x}$ 偏移远大于 $\mathbf{y}$ 偏移的\textit{异质协变量偏移}具有更强的鲁棒性。这些结果依赖于一个可能具有独立意义的 Dudley 积分的新型 Hölder 型不等式。此外,我们通过实验验证了理论发现,展示了在多个领域中,“更简单”特征对偏移的鲁棒性提升。