In transfer learning, the learner leverages auxiliary data to improve generalization on a main task. However, the precise theoretical understanding of when and how auxiliary data help remains incomplete. We provide new insights on this issue in two canonical linear settings: ordinary least squares regression and under-parameterized linear neural networks. For linear regression, we derive exact closed-form expressions for the expected generalization error with bias-variance decomposition, yielding necessary and sufficient conditions for auxiliary tasks to improve generalization on the main task. We also derive globally optimal task weights as outputs of solvable optimization programs, with consistency guarantees for empirical estimates. For linear neural networks with shared representations of width $q \leq K$, where $K$ is the number of auxiliary tasks, we derive a non-asymptotic expectation bound on the generalization error, yielding the first non-vacuous sufficient condition for beneficial auxiliary learning in this setting, as well as principled directions for task weight curation. We achieve this by proving a new column-wise low-rank perturbation bound for random matrices, which improves upon existing bounds by preserving fine-grained column structures. Our results are verified on synthetic data simulated with controlled parameters.
翻译:在迁移学习中,学习者利用辅助数据来提升主任务上的泛化能力。然而,关于辅助数据何时以及如何帮助改进的理论理解仍不完整。我们在两个经典线性设定下提供了关于该问题的新见解:普通最小二乘回归和欠参数化线性神经网络。对于线性回归,我们推导了带有偏差-方差分解的期望泛化误差的精确闭式表达式,从而给出了辅助任务改进主任务泛化性能的充要条件。我们还推导了作为可解优化程序输出的全局最优任务权重,并给出了经验估计的一致性保证。对于共享表示宽度为 $q \leq K$(其中 $K$ 为辅助任务数量)的线性神经网络,我们推导了泛化误差的非渐近期望上界,首次给出了该设定下有益辅助学习的非平凡充分条件,以及任务权重调节的原则性方向。我们通过证明一个新的随机矩阵列向低秩扰动界来实现这一目标,该界通过保留精细的列结构改进了现有界。我们的结果在使用控制参数的合成数据仿真中得到了验证。