Obtaining rigorous statistical guarantees for generalization under distribution shift remains an open and active research area. We study a setting we call combinatorial distribution shift, where (a) under the test- and training-distributions, the labels $z$ are determined by pairs of features $(x,y)$, (b) the training distribution has coverage of certain marginal distributions over $x$ and $y$ separately, but (c) the test distribution involves examples from a product distribution over $(x,y)$ that is {not} covered by the training distribution. Focusing on the special case where the labels are given by bilinear embeddings into a Hilbert space $H$: $\mathbb{E}[z \mid x,y ]=\langle f_{\star}(x),g_{\star}(y)\rangle_{{H}}$, we aim to extrapolate to a test distribution domain that is $not$ covered in training, i.e., achieving bilinear combinatorial extrapolation. Our setting generalizes a special case of matrix completion from missing-not-at-random data, for which all existing results require the ground-truth matrices to be either exactly low-rank, or to exhibit very sharp spectral cutoffs. In this work, we develop a series of theoretical results that enable bilinear combinatorial extrapolation under gradual spectral decay as observed in typical high-dimensional data, including novel algorithms, generalization guarantees, and linear-algebraic results. A key tool is a novel perturbation bound for the rank-$k$ singular value decomposition approximations between two matrices that depends on the relative spectral gap rather than the absolute spectral gap, a result that may be of broader independent interest.
翻译:在分布偏移下获得泛化能力的严格统计保证仍是一个开放且活跃的研究领域。我们研究一种称为组合分布偏移的设置,其中:(a) 在测试分布和训练分布下,标签 $z$ 由特征对 $(x,y)$ 决定;(b) 训练分布覆盖了 $x$ 和 $y$ 各自的边缘分布,但 (c) 测试分布包含来自训练分布未覆盖的 $(x,y)$ 乘积分布的样本。我们专注于标签由希尔伯特空间 $H$ 中的双线性嵌入给出的特例:$\mathbb{E}[z \mid x,y ]=\langle f_{\star}(x),g_{\star}(y)\rangle_{{H}}$,目标是外推到训练中未覆盖的测试分布域,即实现双线性组合外推。我们的设置推广了非随机缺失数据矩阵补全的一个特例,而现有所有结果都要求真实矩阵要么精确低秩,要么呈现非常尖锐的谱截断。在本工作中,我们建立了一系列理论结果,使得在典型高维数据中观察到的渐近谱衰减条件下实现双线性组合外推,包括新算法、泛化保证和线性代数结果。一个关键工具是矩阵间秩-$k$ 奇异值分解近似的新扰动界,该界依赖于相对谱间隙而非绝对谱间隙,这一结果可能具有更广泛的独立意义。