Obtaining rigorous statistical guarantees for generalization under distribution shift remains an open and active research area. We study a setting we call combinatorial distribution shift, where (a) under the test- and training-distributions, the labels $z$ are determined by pairs of features $(x,y)$, (b) the training distribution has coverage of certain marginal distributions over $x$ and $y$ separately, but (c) the test distribution involves examples from a product distribution over $(x,y)$ that is {not} covered by the training distribution. Focusing on the special case where the labels are given by bilinear embeddings into a Hilbert space $H$: $\mathbb{E}[z \mid x,y ]=\langle f_{\star}(x),g_{\star}(y)\rangle_{{H}}$, we aim to extrapolate to a test distribution domain that is $not$ covered in training, i.e., achieving bilinear combinatorial extrapolation. Our setting generalizes a special case of matrix completion from missing-not-at-random data, for which all existing results require the ground-truth matrices to be either exactly low-rank, or to exhibit very sharp spectral cutoffs. In this work, we develop a series of theoretical results that enable bilinear combinatorial extrapolation under gradual spectral decay as observed in typical high-dimensional data, including novel algorithms, generalization guarantees, and linear-algebraic results. A key tool is a novel perturbation bound for the rank-$k$ singular value decomposition approximations between two matrices that depends on the relative spectral gap rather than the absolute spectral gap, a result that may be of broader independent interest.
翻译:在分布偏移下获得泛化能力的严格统计保证仍是一个开放且活跃的研究领域。我们研究了一种称为组合分布偏移的设置,其中:(a) 在测试分布和训练分布下,标签 $z$ 由特征对 $(x,y)$ 共同决定;(b) 训练分布覆盖了 $x$ 和 $y$ 各自的边缘分布,但 (c) 测试分布包含来自 $(x,y)$ 乘积分布的样本,而这些样本并未被训练分布覆盖。我们聚焦于标签由双线性嵌入到希尔伯特空间 $H$ 中的特殊情况:$\mathbb{E}[z \mid x,y ]=\langle f_{\star}(x),g_{\star}(y)\rangle_{{H}}$,旨在外推至训练中未覆盖的测试分布域,即实现双线性组合外推。该设定泛化了非随机缺失数据矩阵补全的一种特例,而现有结果要求真实矩阵要么精确低秩,要么具有极陡的谱截断。在本工作中,我们发展了一系列理论结果,使得在典型高维数据中观察到的渐进谱衰减情形下实现双线性组合外推成为可能,包括新型算法、泛化保证及线性代数结果。关键工具是针对两个矩阵之间秩-$k$ 奇异值分解近似的全新扰动界,该界依赖于相对谱间隙而非绝对谱间隙,这一结果可能具有更广泛的独立意义。