SMART: A Spectral Transfer Approach to Multi-Task Learning

Multi-task learning is effective for related applications, but its performance can deteriorate when the target sample size is small. Transfer learning can borrow strength from related studies; yet, many existing methods rely on restrictive bounded-difference assumptions between the source and target models. We propose SMART, a spectral transfer method for multi-task linear regression that instead assumes spectral similarity: the target left and right singular subspaces lie within the corresponding source subspaces and are sparsely aligned with the source singular bases. Such an assumption is natural when studies share latent structures and enables transfer beyond the bounded-difference settings. SMART estimates the target coefficient matrix through structured regularization that incorporates spectral information from a source study. Importantly, it requires only a fitted source model rather than the raw source data, making it useful when data sharing is limited. Although the optimization problem is nonconvex, we develop a practical ADMM-based algorithm. We establish general, non-asymptotic error bounds and a minimax lower bound in the noiseless-source regime. Under additional regularity conditions, these results yield near-minimax Frobenius error rates up to logarithmic factors. Simulations confirm improved estimation accuracy and robustness to negative transfer, and analysis of multi-modal single-cell data demonstrates better predictive performance. The Python implementation of SMART, along with the code to reproduce all experiments in this paper, is publicly available at https://github.com/boxinz17/smart.

翻译：多任务学习在相关应用中效果显著，但当目标样本量较小时，其性能可能下降。迁移学习能够借助相关研究的优势，然而现有许多方法依赖于源模型与目标模型之间严格的有界差异假设。我们提出SMART，一种面向多任务线性回归的谱迁移方法，该方法采用谱相似性假设：目标左、右奇异子空间分别包含于对应源子空间中，并与源奇异基稀疏对齐。当研究共享潜在结构时，该假设具备自然合理性，并使得迁移能够突破有界差异场景的限制。SMART通过结构化正则化估计目标系数矩阵，该过程整合了源研究的谱信息。关键在于，该方法仅需已拟合的源模型而非原始源数据，因此特别适用于数据共享受限的场景。尽管优化问题非凸，我们开发了实用的基于ADMM的算法。在无噪声源环境下，我们建立了普适的非渐近误差界以及极小化极大下界。在额外正则性条件下，这些结果可导出接近最优的Frobenius误差率（仅对数因子有差异）。仿真实验验证了该方法在估计精度提升及负迁移鲁棒性方面的优势，多模态单细胞数据分析则展示了其更优的预测性能。SMART的Python实现以及所有实验复现代码已公开于https://github.com/boxinz17/smart。

相关内容

多任务学习

关注 162

多任务学习（MTL）是机器学习的一个子领域，可以同时解决多个学习任务，同时利用各个任务之间的共性和差异。与单独训练模型相比，这可以提高特定任务模型的学习效率和预测准确性。多任务学习是归纳传递的一种方法，它通过将相关任务的训练信号中包含的域信息用作归纳偏差来提高泛化能力。通过使用共享表示形式并行学习任务来实现,每个任务所学的知识可以帮助更好地学习其它任务。

【AAAI2025】穿越多模态领域：通过低秩序列多模态适配器实现高效迁移学习

专知会员服务

14+阅读 · 2024年12月13日

资源受限的大模型高效迁移学习算法研究

专知会员服务

27+阅读 · 2024年11月8日

【ICML2023】基于最优多任务插值的多模态基础模型迁移

专知会员服务

31+阅读 · 2023年4月29日

【清华大学龙明盛副教授】迁移学习理论与算法，59页ppt

专知会员服务

84+阅读 · 2020年11月27日