论核学习中的数据的本征维度 (On the Intrinsic Dimensions of Data in Kernel Learning)

The manifold hypothesis suggests that the generalization performance of machine learning methods improves significantly when the intrinsic dimension of the input distribution's support is low. In the context of KRR, we investigate two alternative notions of intrinsic dimension. The first, denoted $d_ρ$, is the upper Minkowski dimension defined with respect to the canonical metric induced by a kernel function $K$ on a domain $Ω$. The second, denoted $d_K$, is the effective dimension, derived from the decay rate of Kolmogorov $n$-widths associated with $K$ on $Ω$. Given a probability measure $μ$ on $Ω$, we analyze the relationship between these $n$-widths and eigenvalues of the integral operator $φ\to \int_ΩK(\cdot,x)φ(x)dμ(x)$. We show that, for a fixed domain $Ω$, the Kolmogorov $n$-widths characterize the worst-case eigenvalue decay across all probability measures $μ$ supported on $Ω$. These eigenvalues are central to understanding the generalization behavior of constrained KRR, enabling us to derive an excess error bound of order $O(n^{-\frac{2+d_K}{2+2d_K} + ε})$ for any $ε> 0$, when the training set size $n$ is large. We also propose an algorithm that estimates upper bounds on the $n$-widths using only a finite sample from $μ$. For distributions close to uniform, we prove that $ε$-accurate upper bounds on all $n$-widths can be computed with high probability using at most $O\left(ε^{-d_ρ}\log\frac{1}ε\right)$ samples, with fewer required for small $n$. Finally, we compute the effective dimension $d_K$ for various fractal sets and present additional numerical experiments. Our results show that, for kernels such as the Laplace kernel, the effective dimension $d_K$ can be significantly smaller than the Minkowski dimension $d_ρ$, even though $d_K = d_ρ$ provably holds on regular domains.

翻译：流形假说认为，当输入分布支撑集的本征维度较低时，机器学习方法的泛化性能会显著提升。在核岭回归（KRR）的背景下，我们研究了两种替代的本征维度概念。第一种记为 $d_ρ$，是定义在由核函数 $K$ 在定义域 $Ω$ 上诱导的典范度量下的上闵可夫斯基维度。第二种记为 $d_K$，是有效维度，源自与 $K$ 在 $Ω$ 上相关的柯尔莫哥洛夫 $n$ 宽度的衰减率。给定 $Ω$ 上的概率测度 $μ$，我们分析了这些 $n$ 宽度与积分算子 $φ\to \int_ΩK(\cdot,x)φ(x)dμ(x)$ 的特征值之间的关系。我们证明，对于固定的定义域 $Ω$，柯尔莫哥洛夫 $n$ 宽度刻画了所有支撑在 $Ω$ 上的概率测度 $μ$ 对应的最坏情况特征值衰减。这些特征值对于理解带约束的 KRR 的泛化行为至关重要，使我们能够在训练集大小 $n$ 较大时，推导出对于任意 $ε> 0$ 的 $O(n^{-\frac{2+d_K}{2+2d_K} + ε})$ 阶的过剩误差界。我们还提出了一种算法，仅利用来自 $μ$ 的有限样本即可估计 $n$ 宽度的上界。对于接近均匀的分布，我们证明，以高概率计算所有 $n$ 宽度的 $ε$ 精确上界最多需要 $O\left(ε^{-d_ρ}\log\frac{1}ε\right)$ 个样本，且对于较小的 $n$ 所需样本更少。最后，我们计算了多种分形集的有效维度 $d_K$，并展示了额外的数值实验。我们的结果表明，对于诸如拉普拉斯核这样的核，即使可以证明在规则定义域上 $d_K = d_ρ$ 成立，有效维度 $d_K$ 也可能显著小于闵可夫斯基维度 $d_ρ$。