In this paper, we first extend the result of FL93 and prove universal consistency for a classification rule based on wide and deep ReLU neural networks trained on the logistic loss. Unlike the approach in FL93 that decomposes the estimation and empirical error, we directly analyze the classification risk based on the observation that a realization of a neural network that is wide enough is capable of interpolating an arbitrary number of points. Secondly, we give sufficient conditions for a class of probability measures under which classifiers based on neural networks achieve minimax optimal rates of convergence. Our result is motivated from the practitioner's observation that neural networks are often trained to achieve 0 training error, which is the case for our proposed neural network classifiers. Our proofs hinge on recent developments in empirical risk minimization and on approximation rates of deep ReLU neural networks for various function classes of interest. Applications to classical function spaces of smoothness illustrate the usefulness of our result.
翻译:本文首先拓展了FL93的结果,证明基于逻辑损失训练的宽深ReLU神经网络分类规则具有通用一致性。与FL93通过分解估计误差与经验误差的方法不同,我们基于"足够宽的神经网络可实现任意数量点的插值"这一观察直接分析分类风险。其次,我们给出了一类概率测度下神经网络分类器达到极小化最优收敛速率的充分条件。该结果的动机源于实践者观察到的"神经网络常被训练至零训练误差"现象——这正是本文所提神经网络分类器的情形。我们的证明依赖于经验风险最小化的最新进展以及深度ReLU神经网络对各类函数空间的逼近速率。在经典光滑函数空间上的应用实例证明了该结果的有效性。