This paper establishes a theoretical framework for the uniform convergence of smoothly activated deep neural network (DNN) estimators. While standard ReLU networks achieve minimax-optimal rates in the $L^2(P)$ norm for various nonparametric regression tasks, we establish a theoretical lower bound demonstrating that least-squares ReLU estimators can suffer from the curse of dimensionality in their uniform convergence behavior. Motivated by the need for reliable uniform guarantees in downstream tasks requiring worst-case reliability, we address this limitation by analyzing smoothly activated DNNs (smooth DNNs), encompassing both feedforward and residual structures. We establish novel pseudo-dimension bounds, non-asymptotic approximation guarantees, and Hölder-norm bounds for the approximators of these models. Leveraging these results, we derive non-asymptotic uniform convergence rates for smooth DNN estimators across multiple statistical contexts, including Huber, least-squares, quantile, and logistic regression. We prove that smooth DNNs can mitigate the {curse of dimensionality} in uniform convergence by adaptively exploiting the low-dimensional hierarchical composition structure of the target function. Supported by both simulation studies and a real-world application, our results position smooth DNNs as a theoretically grounded and practically viable alternative to ReLU networks for statistical learning tasks requiring uniform guarantees.
翻译:本文建立了平滑激活深度神经网络估计量的一致收敛理论框架。尽管标准ReLU网络在$L^2(P)$范数下针对各类非参数回归任务可达到极小化最优速率,但我们建立了理论下界,证明最小二乘ReLU估计量的一致收敛行为会遭受维数灾难。受下游任务中需最坏情况可靠性的统一保障需求驱动,本文通过分析包含前馈与残差结构的平滑激活深度神经网络(平滑DNN)来应对该局限性。我们建立了新型伪维数界、非渐近逼近保证以及这些模型近似量的Hölder范数界。基于这些结果,我们推导了平滑DNN估计量在多种统计场景(包括Huber回归、最小二乘回归、分位数回归和逻辑回归)中的非渐近一致收敛速率。我们证明,平滑DNN可通过自适应利用目标函数的低维层次组合结构来缓解一致收敛中的维数灾难。在仿真实验与真实应用的双重支撑下,本文结果将平滑DNN定位为需要统一保障的统计学习任务中ReLU网络的理论有据且实践可行的替代方案。