The problem of learning one task with samples from another task has received much interest recently. In this paper, we ask a fundamental question: when is combining data from two tasks better than learning one task alone? Intuitively, the transfer effect from one task to another task depends on dataset shifts such as sample sizes and covariance matrices. However, quantifying such a transfer effect is challenging since we need to compare the risks between joint learning and single-task learning, and the comparative advantage of one over the other depends on the exact kind of dataset shift between both tasks. This paper uses random matrix theory to tackle this challenge in a linear regression setting with two tasks. We give precise asymptotics about the excess risks of some commonly used estimators in the high-dimensional regime, when the sample sizes increase proportionally with the feature dimension at fixed ratios. The precise asymptotics is provided as a function of the sample sizes and covariate/model shifts, which can be used to study transfer effects: In a random-effects model, we give conditions to determine positive and negative transfers between learning two tasks versus single-task learning; the conditions reveal intricate relations between dataset shifts and transfer effects. Simulations justify the validity of the asymptotics in finite dimensions. Our analysis examines several functions of two different sample covariance matrices, revealing some estimates that generalize classical results in the random matrix theory literature, which may be of independent interest.
翻译:最近,利用另一个任务中的样本来学习一个任务的问题引起了广泛关注。本文提出一个基本问题:何时将两个任务的数据结合起来学习优于单独学习一个任务?直观上,从一个任务到另一个任务的迁移效应取决于数据集偏移,例如样本量和协方差矩阵。然而,量化这种迁移效应具有挑战性,因为我们需要比较联合学习与单任务学习之间的风险,而一者相对于另一者的比较优势取决于两个任务之间数据集偏移的具体类型。本文利用随机矩阵理论在线性回归设置下应对这一挑战,涉及两个任务。我们给出了一些常用估计量在高维机制下(当样本量以固定比率随特征维度成比例增加时)关于超额风险的精确渐近分析。该精确渐近分析以样本量和协变量/模型偏移的函数形式提供,可用于研究迁移效应:在随机效应模型中,我们给出了确定两个任务联合学习与单任务学习之间正迁移和负迁移的条件;这些条件揭示了数据集偏移与迁移效应之间的复杂关系。模拟验证了有限维度下渐近分析的有效性。我们的分析考察了两个不同样本协方差矩阵的几种函数,揭示了随机矩阵理论文献中推广经典结果的一些估计量,这可能具有独立的意义。