This paper considers the problem of learning a single ReLU neuron with squared loss (a.k.a., ReLU regression) in the overparameterized regime, where the input dimension can exceed the number of samples. We analyze a Perceptron-type algorithm called GLM-tron (Kakade et al., 2011), and provide its dimension-free risk upper bounds for high-dimensional ReLU regression in both well-specified and misspecified settings. Our risk bounds recover several existing results as special cases. Moreover, in the well-specified setting, we also provide an instance-wise matching risk lower bound for GLM-tron. Our upper and lower risk bounds provide a sharp characterization of the high-dimensional ReLU regression problems that can be learned via GLM-tron. On the other hand, we provide some negative results for stochastic gradient descent (SGD) for ReLU regression with symmetric Bernoulli data: if the model is well-specified, the excess risk of SGD is provably no better than that of GLM-tron ignoring constant factors, for each problem instance; and in the noiseless case, GLM-tron can achieve a small risk while SGD unavoidably suffers from a constant risk in expectation. These results together suggest that GLM-tron might be preferable than SGD for high-dimensional ReLU regression.
翻译:本文考虑在过参数化机制下(即输入维度可能超过样本数量)使用平方损失学习单个ReLU神经元的问题(又称ReLU回归)。我们分析了名为GLM-tron(Kakade等,2011)的感知器类算法,并给出了其在适定设定和误设定设定下进行高维ReLU回归的无维度风险上界。我们的风险界将若干现有结果作为特例纳入其中。此外,在适定设定下,我们还为GLM-tron提供了实例匹配的风险下界。上下风险界共同刻画了可通过GLM-tron学习的高维ReLU回归问题的尖锐特征。另一方面,针对对称伯努利数据下的ReLU回归,我们给出随机梯度下降(SGD)的负面结果:若模型适定,对每个问题实例而言,SGD的超额风险在忽略常数因子时证明不会优于GLM-tron;在无噪声情形下,GLM-tron能获得较小风险,而SGD的期望风险不可避免地存在常数量级。这些结果表明,在高维ReLU回归中GLM-tron可能比SGD更优。