In this paper, we introduce a novel theoretical framework for multi-task regression, applying random matrix theory to provide precise performance estimations, under high-dimensional, non-Gaussian data distributions. We formulate a multi-task optimization problem as a regularization technique to enable single-task models to leverage multi-task learning information. We derive a closed-form solution for multi-task optimization in the context of linear models. Our analysis provides valuable insights by linking the multi-task learning performance to various model statistics such as raw data covariances, signal-generating hyperplanes, noise levels, as well as the size and number of datasets. We finally propose a consistent estimation of training and testing errors, thereby offering a robust foundation for hyperparameter optimization in multi-task regression scenarios. Experimental validations on both synthetic and real-world datasets in regression and multivariate time series forecasting demonstrate improvements on univariate models, incorporating our method into the training loss and thus leveraging multivariate information.
翻译:本文提出了一种新颖的多任务回归理论框架,应用随机矩阵理论为高维非高斯数据分布下的性能估计提供精确分析。我们将多任务优化问题构建为一种正则化技术,使单任务模型能够利用多任务学习信息。在线性模型背景下,我们推导出多任务优化的闭式解。通过将多任务学习性能与原始数据协方差、信号生成超平面、噪声水平以及数据集规模和数量等多种模型统计量相关联,我们的分析提供了有价值的理论洞见。最后,我们提出了训练误差与测试误差的一致估计方法,从而为多任务回归场景中的超参数优化奠定了稳健的理论基础。在合成数据集和真实世界数据集上进行的回归与多元时间序列预测实验验证表明,通过将我们的方法融入训练损失函数以利用多元信息,单变量模型的性能得到了显著提升。