A duality framework for analyzing random feature and two-layer neural networks

We consider the problem of learning functions within the $\mathcal{F}_{p,\pi}$ and Barron spaces, which play crucial roles in understanding random feature models (RFMs), two-layer neural networks, as well as kernel methods. Leveraging tools from information-based complexity (IBC), we establish a dual equivalence between approximation and estimation, and then apply it to study the learning of the preceding function spaces. The duality allows us to focus on the more tractable problem between approximation and estimation. To showcase the efficacy of our duality framework, we delve into two important but under-explored problems: 1) Random feature learning beyond kernel regime: We derive sharp bounds for learning $\mathcal{F}_{p,\pi}$ using RFMs. Notably, the learning is efficient without the curse of dimensionality for $p>1$. This underscores the extended applicability of RFMs beyond the traditional kernel regime, since $\mathcal{F}_{p,\pi}$ with $p<2$ is strictly larger than the corresponding reproducing kernel Hilbert space (RKHS) where $p=2$. 2) The $L^\infty$ learning of RKHS: We establish sharp, spectrum-dependent characterizations for the convergence of $L^\infty$ learning error in both noiseless and noisy settings. Surprisingly, we show that popular kernel ridge regression can achieve near-optimal performance in $L^\infty$ learning, despite it primarily minimizing square loss. To establish the aforementioned duality, we introduce a type of IBC, termed $I$-complexity, to measure the size of a function class. Notably, $I$-complexity offers a tight characterization of learning in noiseless settings, yields lower bounds comparable to Le Cam's in noisy settings, and is versatile in deriving upper bounds. We believe that our duality framework holds potential for broad application in learning analysis across more scenarios.

翻译：我们研究了在$\mathcal{F}_{p,\pi}$空间和Barron空间中的函数学习问题，这些空间对于理解随机特征模型、双层神经网络以及核方法具有关键作用。借助基于信息的复杂性理论工具，我们建立了逼近与估计之间的对偶等价关系，并将其应用于前述函数空间的学习分析。这一对偶性使我们能够聚焦于逼近与估计之间更易处理的问题。为展示对偶框架的有效性，我们深入探讨了两个重要但研究不足的问题：1）核机制之外的随机特征学习：我们推导了使用随机特征模型学习$\mathcal{F}_{p,\pi}$空间的尖锐界。值得注意的是，当$p>1$时，学习过程是高效的且无维度灾难。这凸显了随机特征模型在传统核机制之外的扩展适用性，因为$p<2$时的$\mathcal{F}_{p,\pi}$空间严格大于对应$p=2$时的再生核希尔伯特空间。2）再生核希尔伯特空间的$L^\infty$学习：我们在无噪声和有噪声两种设置下，建立了$L^\infty$学习误差收敛的尖锐且依赖于谱的表征。令人惊讶的是，尽管流行的核岭回归主要最小化平方损失，我们证明其能在$L^\infty$学习中达到接近最优的性能。为建立上述对偶性，我们引入了一种称为$I$-复杂度的基于信息的复杂性度量来衡量函数类的规模。值得注意的是，$I$-复杂度能够紧密表征无噪声设置下的学习性能，在噪声设置下可产生与Le Cam方法相当的下界，并且在推导上界方面具有广泛适用性。我们相信，这一对偶框架在更广泛场景的学习分析中具有广阔的应用潜力。