Multi-task learning (MTL) is a powerful machine learning paradigm designed to leverage shared knowledge across tasks to improve generalization and performance. Previous works have proposed approaches to MTL that can be divided into feature learning, focused on the identification of a common feature representation, and task clustering, where similar tasks are grouped together. In this paper, we propose an MTL approach at the intersection between task clustering and feature transformation based on a two-phase iterative aggregation of targets and features. First, we propose a bias-variance analysis for regression models with additive Gaussian noise, where we provide a general expression of the asymptotic bias and variance of a task, considering a linear regression trained on aggregated input features and an aggregated target. Then, we exploit this analysis to provide a two-phase MTL algorithm (NonLinCTFA). Firstly, this method partitions the tasks into clusters and aggregates each obtained group of targets with their mean. Then, for each aggregated task, it aggregates subsets of features with their mean in a dimensionality reduction fashion. In both phases, a key aspect is to preserve the interpretability of the reduced targets and features through the aggregation with the mean, which is further motivated by applications to Earth science. Finally, we validate the algorithms on synthetic data, showing the effect of different parameters and real-world datasets, exploring the validity of the proposed methodology on classical datasets, recent baselines, and Earth science applications.
翻译:多任务学习(MTL)是一种强大的机器学习范式,旨在通过跨任务共享知识来提升泛化能力与性能。现有研究提出的MTL方法可分为两类:特征学习(关注共同特征表示的识别)与任务聚类(将相似任务分组)。本文提出一种融合任务聚类与特征变换的MTL方法,其核心基于目标与特征的两阶段迭代聚合。首先,我们针对含加性高斯噪声的回归模型进行偏差-方差分析,推导了基于聚合输入特征与聚合目标训练的线性回归任务渐近偏差与方差的通用表达式。继而利用该分析提出两阶段MTL算法(NonLinCTFA):第一阶段将任务划分为簇,并通过均值聚合各目标组;第二阶段以降维方式通过均值聚合各聚合任务的特征子集。两阶段的关键在于通过均值聚合保持降维目标与特征的可解释性,该特性在地球科学应用中得到进一步验证。最终,我们在合成数据上验证算法不同参数的影响,并在经典数据集、最新基线方法及地球科学应用中评估了所提方法的有效性。