We consider the problem of learning functions in the $\mathcal{F}_{p,\pi}$ and Barron spaces, which are natural function spaces that arise in the high-dimensional analysis of random feature models (RFMs) and two-layer neural networks. Through a duality analysis, we reveal that the approximation and estimation of these spaces can be considered equivalent in a certain sense. This enables us to focus on the easier problem of approximation and estimation when studying the generalization of both models. The dual equivalence is established by defining an information-based complexity that can effectively control estimation errors. Additionally, we demonstrate the flexibility of our duality framework through comprehensive analyses of two concrete applications. The first application is to study learning functions in $\mathcal{F}_{p,\pi}$ with RFMs. We prove that the learning does not suffer from the curse of dimensionality as long as $p>1$, implying RFMs can work beyond the kernel regime. Our analysis extends existing results [CMM21] to the noisy case and removes the requirement of overparameterization. The second application is to investigate the learnability of reproducing kernel Hilbert space (RKHS) under the $L^\infty$ metric. We derive both lower and upper bounds of the minimax estimation error by using the spectrum of the associated kernel. We then apply these bounds to dot-product kernels and analyze how they scale with the input dimension. Our results suggest that learning with ReLU (random) features is generally intractable in terms of reaching high uniform accuracy.
翻译:我们考虑在$\mathcal{F}_{p,\pi}$和Barron空间中学习函数的问题,这些函数空间是随机特征模型(RFM)和双层神经网络的高维分析中自然出现的函数空间。通过对偶性分析,我们揭示了这些空间的逼近与估计在某种意义上是等价的。这使得我们在研究两类模型的泛化问题时,可以聚焦于更易处理的逼近与估计问题。通过定义一种能有效控制估计误差的基于信息复杂度的概念,我们建立了两者的对偶等价性。此外,我们通过两个具体应用的综合分析,展示了所提出的对偶性框架的灵活性。第一个应用是研究使用RFM学习$\mathcal{F}_{p,\pi}$中的函数。我们证明,只要$p>1$,学习过程就不会遭遇维数灾难,这意味着RFM可以超越核机制工作。我们的分析将现有结果[CMM21]扩展到含噪声情形,并移除了过参数化的要求。第二个应用是研究在$L^\infty\)度量下再生核希尔伯特空间(RKHS)的可学习性。我们通过关联核的谱推导出极小极大估计误差的下界与上界,随后将这些界应用于点积核,并分析它们如何随输入维度变化。结果表明,使用ReLU(随机)特征进行学习在实现高均匀精度方面通常是不可行的。