We consider kernels of the form $(x,x') \mapsto \phi(\|x-x'\|^2_\Sigma)$ parametrized by $\Sigma$. For such kernels, we study a variant of the kernel ridge regression problem which simultaneously optimizes the prediction function and the parameter $\Sigma$ of the reproducing kernel Hilbert space. The eigenspace of the $\Sigma$ learned from this kernel ridge regression problem can inform us which directions in covariate space are important for prediction. Assuming that the covariates have nonzero explanatory power for the response only through a low dimensional subspace (central mean subspace), we find that the global minimizer of the finite sample kernel learning objective is also low rank with high probability. More precisely, the rank of the minimizing $\Sigma$ is with high probability bounded by the dimension of the central mean subspace. This phenomenon is interesting because the low rankness property is achieved without using any explicit regularization of $\Sigma$, e.g., nuclear norm penalization. Our theory makes correspondence between the observed phenomenon and the notion of low rank set identifiability from the optimization literature. The low rankness property of the finite sample solutions exists because the population kernel learning objective grows "sharply" when moving away from its minimizers in any direction perpendicular to the central mean subspace.
翻译:考虑形如 $(x,x') \mapsto \phi(\|x-x'\|^2_\Sigma)$ 且由 $\Sigma$ 参数化的核函数。针对此类核函数,我们研究一种同时优化预测函数与再生核希尔伯特空间参数 $\Sigma$ 的核岭回归问题变体。通过该核岭回归问题学习得到的 $\Sigma$ 特征空间,可揭示协变量空间中哪些方向对预测具有重要性。假设协变量仅通过低维子空间(中心均值子空间)对响应变量具有非零解释能力,我们发现有限样本核学习目标的全局最小值解以高概率呈现低秩性。具体而言,极小化 $\Sigma$ 的秩以高概率受限于中心均值子空间的维度。这一现象值得关注,因为低秩性质是在未对 $\Sigma$ 使用任何显式正则化(如核范数惩罚)的情况下实现的。我们的理论将观测到的现象与优化文献中低秩集可识别性的概念建立了对应关系。有限样本解的低秩性质之所以存在,是因为总体核学习目标函数在沿垂直于中心均值子空间方向偏离其最小值时呈现“尖锐”增长。