Generalization error of min-norm interpolators in transfer learning

This paper establishes the generalization error of pooled min-$\ell_2$-norm interpolation in transfer learning where data from diverse distributions are available. Min-norm interpolators emerge naturally as implicit regularized limits of modern machine learning algorithms. Previous work characterized their out-of-distribution risk when samples from the test distribution are unavailable during training. However, in many applications, a limited amount of test data may be available during training, yet properties of min-norm interpolation in this setting are not well-understood. We address this gap by characterizing the bias and variance of pooled min-$\ell_2$-norm interpolation under covariate and model shifts. The pooled interpolator captures both early fusion and a form of intermediate fusion. Our results have several implications: under model shift, for low signal-to-noise ratio (SNR), adding data always hurts. For higher SNR, transfer learning helps as long as the shift-to-signal (SSR) ratio lies below a threshold that we characterize explicitly. By consistently estimating these ratios, we provide a data-driven method to determine: (i) when the pooled interpolator outperforms the target-based interpolator, and (ii) the optimal number of target samples that minimizes the generalization error. Under covariate shift, if the source sample size is small relative to the dimension, heterogeneity between between domains improves the risk, and vice versa. We establish a novel anisotropic local law to achieve these characterizations, which may be of independent interest in random matrix theory. We supplement our theoretical characterizations with comprehensive simulations that demonstrate the finite-sample efficacy of our results.

翻译：本文研究了在可获得多种分布数据的情况下，迁移学习中池化最小$\ell_2$范数插值的泛化误差。最小范数插值器作为现代机器学习算法隐式正则化极限自然出现。先前的研究刻画了在训练期间无法获得测试分布样本时，其分布外风险。然而，在许多应用中，训练期间可能获得有限数量的测试数据，但此设置下最小范数插值的性质尚未得到充分理解。我们通过刻画协变量偏移和模型偏移下池化最小$\ell_2$范数插值的偏差与方差来填补这一空白。该池化插值器同时捕捉了早期融合和一种中间融合形式。我们的研究结果具有若干意义：在模型偏移下，对于低信噪比（SNR），增加数据总会损害性能；对于较高SNR，只要偏移-信号比（SSR）低于我们明确刻画的阈值，迁移学习即有益。通过一致估计这些比率，我们提供了一种数据驱动方法来确定：（i）池化插值器何时优于基于目标的插值器，以及（ii）最小化泛化误差的最优目标样本数量。在协变量偏移下，若源样本量相对于维度较小，域间异质性可改善风险，反之亦然。我们建立了一种新颖的各向异性局部定律以实现这些刻画，该定律在随机矩阵理论中可能具有独立价值。我们通过全面的仿真实验补充理论刻画，证明了研究结果在有限样本下的有效性。

相关内容

泛化误差

关注 107

学习方法的泛化能力（Generalization Error）是由该方法学习到的模型对未知数据的预测能力，是学习方法本质上重要的性质。现实中采用最多的办法是通过测试泛化误差来评价学习方法的泛化能力。泛化误差界刻画了学习算法的经验风险与期望风险之间偏差和收敛速度。一个机器学习的泛化误差（Generalization Error），是一个描述学生机器在从样品数据中学习之后，离教师机器之间的差距的函数。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

[AAAI 2021]图到图：面向精确可解释的联机手写数学公式识别

专知会员服务

21+阅读 · 2021年2月19日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日