Representation multi-task learning (MTL) and transfer learning (TL) have achieved tremendous success in practice. However, the theoretical understanding of these methods is still lacking. Most existing theoretical works focus on cases where all tasks share the same representation, and claim that MTL and TL almost always improve performance. However, as the number of tasks grow, assuming all tasks share the same representation is unrealistic. Also, this does not always match empirical findings, which suggest that a shared representation may not necessarily improve single-task or target-only learning performance. In this paper, we aim to understand how to learn from tasks with \textit{similar but not exactly the same} linear representations, while dealing with outlier tasks. We propose two algorithms that are \textit{adaptive} to the similarity structure and \textit{robust} to outlier tasks under both MTL and TL settings. Our algorithms outperform single-task or target-only learning when representations across tasks are sufficiently similar and the fraction of outlier tasks is small. Furthermore, they always perform no worse than single-task learning or target-only learning, even when the representations are dissimilar. We provide information-theoretic lower bounds to show that our algorithms are nearly \textit{minimax} optimal in a large regime.
翻译:表示多任务学习(MTL)与迁移学习(TL)在实践中已取得巨大成功。然而,对这些方法的理论理解仍显不足。现有理论工作大多聚焦于所有任务共享相同表示的情形,并声称MTL与TL几乎总能提升性能。但随着任务数量增长,假设所有任务共享相同表示并不现实。此外,这一假设与实证发现并不完全吻合——研究表明共享表示未必能提升单任务或仅目标任务的学习性能。本文旨在理解如何从具有“相似但不完全相同”线性表示的任务中学习,并同时处理异常任务。我们提出两种算法,这些算法在MTL和TL场景下均能适应相似性结构并对异常任务具有鲁棒性。当任务间表示足够相似且异常任务占比很小时,我们的算法优于单任务或仅目标任务学习。此外,即使表示不相似,这些算法性能始终不低于单任务或仅目标任务学习。我们通过信息论下界证明,所提算法在大范围内接近极小极大最优。