We consider the approximation of functions by 2-layer neural networks with a small number of hidden weights based on the squared loss and small datasets. Due to the highly non-convex energy landscape, gradient-based training often suffers from local minima. As a remedy, we initialize the hidden weights with samples from a learned proposal distribution, which we parameterize as a deep generative model. To train this model, we exploit the fact that with fixed hidden weights, the optimal output weights solve a linear equation. After learning the generative model, we refine the sampled weights with a gradient-based post-processing in the latent space. Here, we also include a regularization scheme to counteract potential noise. Finally, we demonstrate the effectiveness of our approach by numerical examples.
翻译:本文研究了基于平方损失和小型数据集,利用隐藏权重数量较少的双层神经网络进行函数逼近的问题。由于能量景观的高度非凸性,基于梯度的训练常陷入局部极小值。为解决此问题,我们使用从学习到的建议分布中抽取的样本来初始化隐藏权重,该分布通过深度生成模型进行参数化。训练该模型时,我们利用以下特性:当隐藏权重固定时,最优输出权重可通过线性方程求解。学习生成模型后,我们在潜在空间中使用基于梯度的后处理对采样权重进行精调。在此过程中,我们还引入了正则化方案以抵消潜在噪声。最后,通过数值实验验证了所提方法的有效性。