Theoretical studies on transfer learning or domain adaptation have so far focused on situations with a known hypothesis class or model; however in practice, some amount of model selection is usually involved, often appearing under the umbrella term of hyperparameter-tuning: for example, one may think of the problem of tuning for the right neural network architecture towards a target task, while leveraging data from a related source task. Now, in addition to the usual tradeoffs on approximation vs estimation errors involved in model selection, this problem brings in a new complexity term, namely, the transfer distance between source and target distributions, which is known to vary with the choice of hypothesis class. We present a first study of this problem, focusing on classification; in particular, the analysis reveals some remarkable phenomena: adaptive rates, i.e., those achievable with no distributional information, can be arbitrarily slower than oracle rates, i.e., when given knowledge on distances.
翻译:迁移学习或领域适配的理论研究至今主要集中于已知假设类或模型的情形;然而在实践中,通常涉及一定程度的模型选择,这常以超参数调优的统称术语出现:例如,在利用相关源任务数据的同时,针对目标任务调整神经网络架构的问题。如今,除了模型选择中常见的近似误差与估计误差之间的权衡外,该问题还引入了一项新的复杂性项,即源分布与目标分布之间的迁移距离,已知该距离随假设类的选择而变化。我们首次针对这一问题展开研究,聚焦于分类任务;特别地,分析揭示了若干显著现象:自适应速率(即无需分布信息即可实现的速率)可能任意慢于具有先验距离信息时的最优速率。