Transfer Learning Beyond Bounded Density Ratios

We study the fundamental problem of transfer learning where a learning algorithm collects data from some source distribution $P$ but needs to perform well with respect to a different target distribution $Q$. A standard change of measure argument implies that transfer learning happens when the density ratio $dQ/dP$ is bounded. Yet, prior thought-provoking works by Kpotufe and Martinet (COLT, 2018) and Hanneke and Kpotufe (NeurIPS, 2019) demonstrate cases where the ratio $dQ/dP$ is unbounded, but transfer learning is possible. In this work, we focus on transfer learning over the class of low-degree polynomial estimators. Our main result is a general transfer inequality over the domain $\mathbb{R}^n$, proving that non-trivial transfer learning for low-degree polynomials is possible under very mild assumptions, going well beyond the classical assumption that $dQ/dP$ is bounded. For instance, it always applies if $Q$ is a log-concave measure and the inverse ratio $dP/dQ$ is bounded. To demonstrate the applicability of our inequality, we obtain new results in the settings of: (1) the classical truncated regression setting, where $dQ/dP$ equals infinity, and (2) the more recent out-of-distribution generalization setting for in-context learning linear functions with transformers. We also provide a discrete analogue of our transfer inequality on the Boolean Hypercube $\{-1,1\}^n$, and study its connections with the recent problem of Generalization on the Unseen of Abbe, Bengio, Lotfi and Rizk (ICML, 2023). Our main conceptual contribution is that the maximum influence of the error of the estimator $\widehat{f}-f^*$ under $Q$, $\mathrm{I}_{\max}(\widehat{f}-f^*)$, acts as a sufficient condition for transferability; when $\mathrm{I}_{\max}(\widehat{f}-f^*)$ is appropriately bounded, transfer is possible over the Boolean domain.

翻译：我们研究迁移学习的基本问题，其中学习算法从某个源分布 $P$ 收集数据，但需要在不同的目标分布 $Q$ 上表现良好。标准测度变换论证表明，当密度比率 $dQ/dP$ 有界时，迁移学习可以发生。然而，Kpotufe 和 Martinet（COLT, 2018）以及 Hanneke 和 Kpotufe（NeurIPS, 2019）的早期启发性工作展示了 $dQ/dP$ 无界但迁移学习仍可能的情况。本文聚焦于低阶多项式估计器类别上的迁移学习。我们的主要结果是在域 $\mathbb{R}^n$ 上的一个通用迁移不等式，证明低阶多项式的非平凡迁移学习在非常温和的条件下是可能的，这远远超越了 $dQ/dP$ 有界的经典假设。例如，若 $Q$ 是对数凹测度且逆比率 $dP/dQ$ 有界，则该不等式总是成立。为展示该不等式的适用性，我们在以下场景中获得了新结果：(1) 经典截断回归设置，其中 $dQ/dP$ 等于无穷大；(2) 较新的用于上下文学习线性函数的Transformer分布外泛化设置。我们还在布尔超立方体 $\{-1,1\}^n$ 上提供了迁移不等式的离散模拟，并研究其与 Abbe、Bengio、Lotfi 和 Rizk（ICML, 2023）近期提出的“对未见数据的泛化”问题的联系。我们的核心概念贡献在于：在 $Q$ 下估计器误差 $\widehat{f}-f^*$ 的最大影响力 $\mathrm{I}_{\max}(\widehat{f}-f^*)$ 充当了可迁移性的充分条件；当 $\mathrm{I}_{\max}(\widehat{f}-f^*)$ 被适当有界时，在布尔域上迁移是可能的。