We consider the problem of multi-task learning in the high dimensional setting. In particular, we introduce an estimator and investigate its statistical and computational properties for the problem of multiple connected linear regressions known as Data Enrichment/Sharing. The between-tasks connections are captured by a cross-tasks \emph{common parameter}, which gets refined by per-task \emph{individual parameters}. Any convex function, e.g., norm, can characterize the structure of both common and individual parameters. We delineate the sample complexity of our estimator and provide a high probability non-asymptotic bound for estimation error of all parameters under a geometric condition. We show that the recovery of the common parameter benefits from \emph{all} of the pooled samples. We propose an iterative estimation algorithm with a geometric convergence rate and supplement our theoretical analysis with experiments on synthetic data. Overall, we present a first thorough statistical and computational analysis of inference in the data-sharing model.
翻译:我们研究了高维场景下的多任务学习问题。具体而言,针对被称为“数据增强/共享”的多重关联线性回归问题,我们提出了一种估计器,并探究了其统计与计算性质。任务间的关联通过跨任务的“共同参数”来捕捉,该参数通过每个任务的“个体参数”进行细化。任何凸函数(例如范数)均可用于刻画共同参数与个体参数的结构。我们描述了该估计器的样本复杂度,并在几何条件下给出了所有参数估计误差的高概率非渐近界。研究表明,共同参数的恢复得益于所有汇集样本。我们提出了一种具有几何收敛速率的迭代估计算法,并通过合成数据实验补充了理论分析。总体而言,我们首次对数据共享模型中的推断进行了全面的统计与计算分析。