Data scarcity poses a serious threat to modern machine learning and artificial intelligence, as their practical success typically relies on the availability of big datasets. One effective strategy to mitigate the issue of insufficient data is to first harness information from other data sources possessing certain similarities in the study design stage, and then employ the multi-task or meta learning framework in the analysis stage. In this paper, we focus on multi-task (or multi-source) linear models whose coefficients across tasks share an invariant low-rank component, a popular structural assumption considered in the recent multi-task or meta learning literature. Under this assumption, we propose a new algorithm, called Meta Subspace Pursuit (abbreviated as Meta-SP), that provably learns this invariant subspace shared by different tasks. Under this stylized setup for multi-task or meta learning, we establish both the algorithmic and statistical guarantees of the proposed method. Extensive numerical experiments are conducted, comparing Meta-SP against several competing methods, including popular, off-the-shelf model-agnostic meta learning algorithms such as ANIL. These experiments demonstrate that Meta-SP achieves superior performance over the competing methods in various aspects.
翻译:数据稀缺对现代机器学习和人工智能构成严重威胁,因为其实际成功通常依赖于大数据集的可用性。缓解数据不足问题的有效策略之一是:首先在研究设计阶段利用其他具有某些相似性的数据源中的信息,然后在分析阶段采用多任务或元学习框架。本文聚焦于多任务(或多源)线性模型,这些模型的任务间系数共享一个不变的低秩成分——这是近期多任务或元学习文献中广泛采用的一种结构假设。基于该假设,我们提出了一种名为“元子空间追踪”(简称Meta-SP)的新算法,该算法可证明地学习不同任务共享的不变子空间。在此多任务或元学习的典型设定下,我们建立了所提出方法的算法与统计理论保证。通过大量数值实验,将Meta-SP与包括ANIL等主流现成模型无关元学习算法在内的多种竞争方法进行对比。这些实验表明,Meta-SP在多个方面均优于竞争方法。