SGD培训的SGD在反导标签噪音存在时的任何宽度神经网络 (Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise)

We consider a one-hidden-layer leaky ReLU network of arbitrary width trained by stochastic gradient descent (SGD) following an arbitrary initialization. We prove that SGD produces neural networks that have classification accuracy competitive with that of the best halfspace over the distribution for a broad class of distributions that includes log-concave isotropic and hard margin distributions. Equivalently, such networks can generalize when the data distribution is linearly separable but corrupted with adversarial label noise, despite the capacity to overfit. To the best of our knowledge, this is the first work to show that overparameterized neural networks trained by SGD can generalize when the data is corrupted with adversarial label noise.

翻译：我们认为,在任意初始化后,由随机梯度梯度下降(SGD)训练的任意宽度网络是一个隐藏层漏泄的ReLU网络。我们证明,SGD产生神经网络,其分类准确性与分布范围最广的半空相比具有竞争力,分布范围包括日志相近的异向分布和硬边分布。同样,当数据分布线性分离,但充斥对抗性标签噪音时,这种网络也可以普遍化。据我们所知,这是SGD所培训的超分神经网络在数据被对抗性标签噪音腐蚀时,可以普遍化。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【MIT】约束最小-最大优化的复杂性，84页pdf

专知会员服务

44+阅读 · 2020年9月25日

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日