When More Data Doesn't Help: Limits of Adaptation in Multitask Learning

Multitask learning and related frameworks have achieved tremendous success in modern applications. In multitask learning problem, we are given a set of heterogeneous datasets collected from related source tasks and hope to enhance the performance above what we could hope to achieve by solving each of them individually. The recent work of arXiv:2006.15785 has showed that, without access to distributional information, no algorithm based on aggregating samples alone can guarantee optimal risk as long as the sample size per task is bounded. In this paper, we focus on understanding the statistical limits of multitask learning. We go beyond the no-free-lunch theorem in arXiv:2006.15785 by establishing a stronger impossibility result of adaptation that holds for arbitrarily large sample size per task. This improvement conveys an important message that the hardness of multitask learning cannot be overcame by having abundant data per task. We also discuss the notion of optimal adaptivity that may be of future interests.

翻译：多任务学习及其相关框架在现代应用中取得了巨大成功。在多任务学习问题中，我们获得一组从相关源任务收集的异构数据集，并期望将性能提升至超越单独解决每个任务所能达到的水平。近期工作arXiv:2006.15785表明，在无法获取分布信息的情况下，只要每个任务的样本量有限，任何仅基于聚合样本的算法都无法保证获得最优风险。本文聚焦于理解多任务学习的统计极限。我们通过建立适用于任意大单任务样本量的更强适应不可能性定理，超越了arXiv:2006.15785中的"没有免费午餐"定理。这一改进传递了重要信息：多任务学习的困难性无法通过增加单任务数据量来克服。我们还讨论了可能具有未来研究价值的最优适应性概念。

相关内容

多任务学习

关注 162

多任务学习（MTL）是机器学习的一个子领域，可以同时解决多个学习任务，同时利用各个任务之间的共性和差异。与单独训练模型相比，这可以提高特定任务模型的学习效率和预测准确性。多任务学习是归纳传递的一种方法，它通过将相关任务的训练信号中包含的域信息用作归纳偏差来提高泛化能力。通过使用共享表示形式并行学习任务来实现,每个任务所学的知识可以帮助更好地学习其它任务。

【阿姆斯特丹博士论文】缓解多任务学习中的偏差

专知会员服务

23+阅读 · 2024年11月1日

释放多任务学习的力量：涵盖传统、深度和预训练基础模型时代的综述

专知会员服务

33+阅读 · 2024年5月2日

【牛津大学博士论文】深度多任务学习的极简主义方法，229页pdf

专知会员服务

43+阅读 · 2023年6月12日

【PSL博士论文】论数据受限环境下机器学习的归纳偏差，112页pdf

专知会员服务

40+阅读 · 2023年2月22日