Generalization outside the scope of one's training data requires leveraging prior knowledge about the effects that transfer, and the effects that don't, between different data sources. Bayesian transfer learning is a principled paradigm for specifying this knowledge, and refining it on the basis of data from the source (training) and target (prediction) tasks. We address the challenging transfer learning setting where the learner (i) cannot fine-tune in the target task, and (ii) does not know which source data points correspond to the same task (i.e., the data sources are unknown). We propose a proxy-informed robust method for probabilistic transfer learning (PROMPT), which provides a posterior predictive estimate tailored to the structure of the target task, without requiring the learner have access to any outcome information from the target task. Instead, PROMPT relies on the availability of proxy information. PROMPT uses the same proxy information for two purposes: (i) estimation of effects specific to the target task, and (ii) construction of a robust reweighting of the source data for estimation of effects that transfer between tasks. We provide theoretical results on the effect of this reweighting on the risk of negative transfer, and demonstrate application of PROMPT in two synthetic settings.
翻译:在训练数据范围之外进行泛化,需要利用关于不同数据源之间哪些效应可迁移、哪些不可迁移的先验知识。贝叶斯迁移学习是一种原则性范式,用于规范这一知识,并基于源(训练)任务和目标(预测)任务的数据对其进行精炼。我们针对具有挑战性的迁移学习场景展开研究,其中学习者(i)无法在目标任务中进行微调,且(ii)不知道哪些源数据点对应于同一任务(即数据源未知)。我们提出了一种基于代理信息的鲁棒概率迁移学习方法(PROMPT),该方法可提供针对目标任务结构定制的后验预测估计,且无需学习者访问目标任务的任何结果信息。相反,PROMPT依赖于代理信息的可用性。PROMPT将同一代理信息用于两个目的:(i)估计目标任务特有的效应,以及(ii)构建源数据的鲁棒重加权方案,以估计任务间可迁移的效应。我们提供了关于这种重加权对负迁移风险影响的理论结果,并在两个合成场景中展示了PROMPT的应用。