This paper considers the problem of learning a single ReLU neuron with squared loss (a.k.a., ReLU regression) in the overparameterized regime, where the input dimension can exceed the number of samples. We analyze a Perceptron-type algorithm called GLM-tron (Kakade et al., 2011) and provide its dimension-free risk upper bounds for high-dimensional ReLU regression in both well-specified and misspecified settings. Our risk bounds recover several existing results as special cases. Moreover, in the well-specified setting, we provide an instance-wise matching risk lower bound for GLM-tron. Our upper and lower risk bounds provide a sharp characterization of the high-dimensional ReLU regression problems that can be learned via GLM-tron. On the other hand, we provide some negative results for stochastic gradient descent (SGD) for ReLU regression with symmetric Bernoulli data: if the model is well-specified, the excess risk of SGD is provably no better than that of GLM-tron ignoring constant factors, for each problem instance; and in the noiseless case, GLM-tron can achieve a small risk while SGD unavoidably suffers from a constant risk in expectation. These results together suggest that GLM-tron might be preferable to SGD for high-dimensional ReLU regression.
翻译:本文考虑在过参数化机制下(即输入维度可能超过样本数量)使用平方损失学习单个ReLU神经元的问题(即ReLU回归)。我们分析了一种名为GLM-tron的感知机类算法(Kakade等人,2011),并分别在良好设定和错误设定条件下给出了该算法在高维ReLU回归中与维度无关的风险上界。我们的风险上界将若干现有结果作为特例包含在内。此外,在良好设定条件下,我们为GLM-tron提供了逐实例匹配的风险下界。这些上下界刻画了可通过GLM-tron学习的高维ReLU回归问题的尖锐特征。另一方面,针对对称伯努利数据下的随机梯度下降(SGD)ReLU回归,我们给出了若干负面结果:若模型设定良好,对每个问题实例而言,SGD的过量风险在忽略常数因子时不会优于GLM-tron;在无噪声情况下,GLM-tron可实现较小风险,而SGD的期望风险必然为常数。这些结果共同表明,在高维ReLU回归中GLM-tron可能优于SGD。