We study spectral algorithms in the setting where kernels are learned from data. We introduce the effective span dimension (ESD), an alignment-sensitive complexity measure that depends jointly on the signal, spectrum, and noise level $\sigma^2$. The ESD is well-defined for arbitrary kernels and signals without requiring eigen-decay conditions or source conditions. We prove that for sequence models whose ESD is at most $K$, the minimax excess risk scales as $\sigma^2 K$. Furthermore, we analyze over-parameterized gradient flow and prove that it can reduce the ESD. This finding establishes a connection between adaptive feature learning and provable improvements in generalization of spectral algorithms. We demonstrate the generality of the ESD framework by extending it to linear models and RKHS regression, and we support the theory with numerical experiments. This framework provides a novel perspective on generalization beyond traditional fixed-kernel theories.
翻译:我们研究了在核函数从数据中学习的设定下的谱算法。我们引入了有效跨度维度(ESD),这是一个对齐敏感的复杂度度量,它同时依赖于信号、谱和噪声水平 $\sigma^2$。ESD 对于任意核函数和信号都是良定义的,无需特征值衰减条件或源条件。我们证明,对于 ESD 至多为 $K$ 的序列模型,其极小化极大超额风险的尺度为 $\sigma^2 K$。此外,我们分析了过参数化的梯度流,并证明它可以降低 ESD。这一发现建立了自适应特征学习与谱算法泛化性能可证明改进之间的联系。我们通过将 ESD 框架扩展到线性模型和再生核希尔伯特空间回归来展示其普适性,并通过数值实验支持了理论。该框架为超越传统固定核理论的泛化问题提供了一个新的视角。