Obtaining rigorous statistical guarantees for generalization under distribution shift remains an open and active research area. We study a setting we call combinatorial distribution shift, where (a) under the test- and training-distributions, the labels $z$ are determined by pairs of features $(x,y)$, (b) the training distribution has coverage of certain marginal distributions over $x$ and $y$ separately, but (c) the test distribution involves examples from a product distribution over $(x,y)$ that is {not} covered by the training distribution. Focusing on the special case where the labels are given by bilinear embeddings into a Hilbert space $H$: $\mathbb{E}[z \mid x,y ]=\langle f_{\star}(x),g_{\star}(y)\rangle_{{H}}$, we aim to extrapolate to a test distribution domain that is $not$ covered in training, i.e., achieving bilinear combinatorial extrapolation. Our setting generalizes a special case of matrix completion from missing-not-at-random data, for which all existing results require the ground-truth matrices to be either exactly low-rank, or to exhibit very sharp spectral cutoffs. In this work, we develop a series of theoretical results that enable bilinear combinatorial extrapolation under gradual spectral decay as observed in typical high-dimensional data, including novel algorithms, generalization guarantees, and linear-algebraic results. A key tool is a novel perturbation bound for the rank-$k$ singular value decomposition approximations between two matrices that depends on the relative spectral gap rather than the absolute spectral gap, a result that may be of broader independent interest.
翻译:在分布偏移下获得泛化能力的严格统计保证仍是一个开放且活跃的研究领域。我们研究了一种称为组合性分布偏移的场景,其中:(a) 在测试分布和训练分布下,标签 $z$ 由特征对 $(x,y)$ 决定;(b) 训练分布覆盖了 $x$ 和 $y$ 的某些边际分布,但 (c) 测试分布涉及来自训练分布未覆盖的 $(x,y)$ 乘积分布样本。我们聚焦于标签由希尔伯特空间 $H$ 上的双线性嵌入给出的特例:$\mathbb{E}[z \mid x,y ]=\langle f_{\star}(x),g_{\star}(y)\rangle_{{H}}$,旨在外推到训练中未覆盖的测试分布域,即实现双线性组合性外推。该设定概括了非随机缺失数据下矩阵补全的一种特例,对于此类问题,现有结果要求真实矩阵要么严格低秩,要么呈现非常尖锐的谱截止。在本工作中,我们发展了一系列理论结果,使得在典型高维数据中观察到的渐进谱衰减条件下实现双线性组合性外推成为可能,包括新颖算法、泛化保证和线性代数结果。一项关键工具是两矩阵间 rank-$k$ 奇异值分解近似的新扰动界,该界依赖于相对谱间隙而非绝对谱间隙,这一结果可能具有更广泛的独立意义。