Smoothness Adaptivity in Constant-Depth Neural Networks: Optimal Rates via Smooth Activations

Smooth activation functions are ubiquitous in modern deep learning, yet their theoretical advantages over non-smooth counterparts remain poorly understood. In this work, we characterize both approximation and statistical properties of neural networks with smooth activations over the Sobolev space $W^{s,\infty}([0,1]^d)$ for arbitrary smoothness $s>0$. We prove that constant-depth networks equipped with smooth activations automatically exploit arbitrarily high orders of target function smoothness, achieving the minimax-optimal approximation and estimation error rates (up to logarithmic factors). In sharp contrast, networks with non-smooth activations, such as ReLU, lack this adaptivity: their attainable approximation order is strictly limited by depth, and capturing higher-order smoothness requires proportional depth growth. These results identify activation smoothness as a fundamental mechanism, alternative to depth, for attaining statistical optimality. Technically, our results are established via a constructive approximation framework that produces explicit neural network approximators with carefully controlled parameter norms and model size. This complexity control ensures statistical learnability under empirical risk minimization (ERM) and removes the impractical sparsity constraints commonly required in prior analyses.

翻译：光滑激活函数在现代深度学习中无处不在，然而相较于非光滑激活函数，其理论优势仍未得到充分理解。本研究刻画了具有光滑激活函数的神经网络在任意光滑度 $s>0$ 的 Sobolev 空间 $W^{s,\infty}([0,1]^d)$ 上的逼近与统计特性。我们证明，配备光滑激活函数的恒定深度网络能够自动利用目标函数任意高阶的光滑性，达到极小极大最优的逼近与估计误差速率（至多相差对数因子）。与此形成鲜明对比的是，采用非光滑激活函数（如 ReLU）的网络缺乏这种自适应性：其可达到的逼近阶数严格受深度限制，捕捉高阶光滑性需要深度成比例增长。这些结果表明，激活函数的光滑性是除深度之外实现统计最优性的另一根本机制。在技术上，我们的结果通过一个构造性逼近框架建立，该框架生成具有精细控制的参数范数与模型规模的显式神经网络逼近器。这种复杂度控制确保了经验风险最小化（ERM）下的统计可学习性，并消除了先前分析中通常需要的、不切实际的稀疏性约束。

相关内容

激活函数

关注 44

在人工神经网络中，给定一个输入或一组输入，节点的激活函数定义该节点的输出。一个标准集成电路可以看作是一个由激活函数组成的数字网络，根据输入的不同，激活函数可以是开(1)或关(0)。这类似于神经网络中的线性感知器的行为。然而，只有非线性激活函数允许这样的网络只使用少量的节点来计算重要问题，并且这样的激活函数被称为非线性。

【博士论文】理解神经网络的训练动态：从局部优化轨迹与特征学习视角

专知会员服务

14+阅读 · 2025年8月15日

【普林斯顿博士论文】理解神经网络的训练动态：局部优化轨迹与特征学习视角

专知会员服务

22+阅读 · 2025年7月17日

【斯坦福博士论文】时序平滑性假设下的深度神经网络自适应与正则化方法

专知会员服务

15+阅读 · 2025年3月25日

【MIT博士论文】非线性优化在机器学习应用中的平滑性与自适应性

专知会员服务

27+阅读 · 2024年8月27日