Deep Residual Neural Networks (ResNets) have demonstrated remarkable success across a wide range of real-world applications. In this paper, we identify a suitable scaling factor (denoted by $\alpha$) on the residual branch of deep wide ResNets to achieve good generalization ability. We show that if $\alpha$ is a constant, the class of functions induced by Residual Neural Tangent Kernel (RNTK) is asymptotically not learnable, as the depth goes to infinity. We also highlight a surprising phenomenon: even if we allow $\alpha$ to decrease with increasing depth $L$, the degeneration phenomenon may still occur. However, when $\alpha$ decreases rapidly with $L$, the kernel regression with deep RNTK with early stopping can achieve the minimax rate provided that the target regression function falls in the reproducing kernel Hilbert space associated with the infinite-depth RNTK. Our simulation studies on synthetic data and real classification tasks such as MNIST, CIFAR10 and CIFAR100 support our theoretical criteria for choosing $\alpha$.
翻译:深度残差神经网络(ResNets)已在众多实际应用场景中展现出卓越性能。本文针对深度宽ResNet的残差分支,确定了一种合适的缩放因子(记为$\alpha$),以实现良好的泛化能力。我们证明:若$\alpha$为常数,则残差神经正切核(RNTK)诱导的函数类在深度趋于无穷时将渐进不可学习。我们还揭示了一个令人惊讶的现象:即便允许$\alpha$随深度$L$增加而减小,退化现象仍可能发生。然而,当$\alpha$随$L$快速下降时,只要目标回归函数属于无穷深度RNTK关联的再生核希尔伯特空间,采用早期停止策略的深度RNTK核回归即可达到极小化最优收敛速率。基于合成数据及MNIST、CIFAR10、CIFAR100等真实分类任务的仿真实验,验证了我们提出的$\alpha$选取理论准则。