Hypothesis transfer learning (HTL) contrasts domain adaptation by allowing for a previous task leverage, named the source, into a new one, the target, without requiring access to the source data. Indeed, HTL relies only on a hypothesis learnt from such source data, relieving the hurdle of expansive data storage and providing great practical benefits. Hence, HTL is highly beneficial for real-world applications relying on big data. The analysis of such a method from a theoretical perspective faces multiple challenges, particularly in classification tasks. This paper deals with this problem by studying the learning theory of HTL through algorithmic stability, an attractive theoretical framework for machine learning algorithms analysis. In particular, we are interested in the statistical behaviour of the regularized empirical risk minimizers in the case of binary classification. Our stability analysis provides learning guarantees under mild assumptions. Consequently, we derive several complexity-free generalization bounds for essential statistical quantities like the training error, the excess risk and cross-validation estimates. These refined bounds allow understanding the benefits of transfer learning and comparing the behaviour of standard losses in different scenarios, leading to valuable insights for practitioners.
翻译:假设迁移学习(HTL)与领域自适应不同,它允许将先前任务(称为源任务)的知识迁移到新任务(目标任务)中,且无需访问源数据。实际上,HTL仅依赖于从源数据学习到的假设,从而减轻了大规模数据存储的负担,并带来了巨大的实际优势。因此,HTL对于依赖大数据的实际应用非常有益。从理论角度分析此类方法面临多重挑战,尤其是在分类任务中。本文通过算法稳定性(一种分析机器学习算法极具吸引力的理论框架)研究HTL的学习理论,从而解决这一问题。具体而言,我们关注二元分类情况下正则化经验风险最小化器的统计行为。我们的稳定性分析在温和假设下提供了学习保证。由此,我们推导出若干免于复杂度依赖的泛化界,涵盖训练误差、超额风险及交叉验证估计等关键统计量。这些精细化的界有助于理解迁移学习的优势,并比较不同场景下标准损失函数的行为,为实践者提供宝贵见解。